Model CLARA-FTSMC Indeks BRENT
Step 1: Import Library¶
In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from collections import defaultdict
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn_extra.cluster import KMedoids
from sklearn.metrics import silhouette_score, davies_bouldin_score
import warnings
warnings.filterwarnings('ignore')
Step 2: Import Dataset¶
In [2]:
df = pd.read_csv("Brent Oil Futures Historical Data.csv")
df
Out[2]:
| Date | Price | Open | High | Low | Vol. | Change % | |
|---|---|---|---|---|---|---|---|
| 0 | 07/31/2025 | 72.53 | 73.53 | 73.53 | 72.40 | 16.02K | -0.97% |
| 1 | 07/30/2025 | 73.24 | 72.57 | 73.63 | 71.75 | 111.92K | 1.01% |
| 2 | 07/29/2025 | 72.51 | 70.28 | 73.08 | 69.86 | 136.65K | 3.53% |
| 3 | 07/28/2025 | 70.04 | 68.54 | 70.45 | 68.35 | 203.44K | 2.34% |
| 4 | 07/25/2025 | 68.44 | 69.36 | 69.86 | 68.31 | 166.66K | 0.12% |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1286 | 08/07/2020 | 44.40 | 45.15 | 45.30 | 44.27 | 171.72K | -1.53% |
| 1287 | 08/06/2020 | 45.09 | 45.32 | 45.71 | 44.85 | 184.52K | -0.18% |
| 1288 | 08/05/2020 | 45.17 | 44.28 | 46.23 | 44.24 | 246.98K | 1.67% |
| 1289 | 08/04/2020 | 44.43 | 43.84 | 44.82 | 43.24 | 227.12K | 0.63% |
| 1290 | 08/03/2020 | 44.15 | 43.53 | 44.46 | 42.89 | 198.70K | 1.96% |
1291 rows × 7 columns
Step 3: Data Prepocessing¶
In [3]:
# Hapus Kolom yang tidak digunakan
df = df.drop(columns=["Open","High","Low","Vol.","Change %"])
In [4]:
# Ubah Kolom Date Menjadi Datetime
df["Date"]=pd.to_datetime(df["Date"])
In [5]:
# Sort data berdasarkan Date dan reset index data
df = df.sort_values(by='Date', ascending=True).reset_index(drop=True)
In [6]:
df
Out[6]:
| Date | Price | |
|---|---|---|
| 0 | 2020-08-03 | 44.15 |
| 1 | 2020-08-04 | 44.43 |
| 2 | 2020-08-05 | 45.17 |
| 3 | 2020-08-06 | 45.09 |
| 4 | 2020-08-07 | 44.40 |
| ... | ... | ... |
| 1286 | 2025-07-25 | 68.44 |
| 1287 | 2025-07-28 | 70.04 |
| 1288 | 2025-07-29 | 72.51 |
| 1289 | 2025-07-30 | 73.24 |
| 1290 | 2025-07-31 | 72.53 |
1291 rows × 2 columns
In [7]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1291 entries, 0 to 1290 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 1291 non-null datetime64[ns] 1 Price 1291 non-null float64 dtypes: datetime64[ns](1), float64(1) memory usage: 20.3 KB
Step 4: Exploratory Data Analysis (EDA)¶
In [8]:
# Deskripsi Data
df.describe()
Out[8]:
| Date | Price | |
|---|---|---|
| count | 1291 | 1291.000000 |
| mean | 2023-01-30 16:21:33.880712704 | 78.311650 |
| min | 2020-08-03 00:00:00 | 37.460000 |
| 25% | 2021-11-01 12:00:00 | 71.055000 |
| 50% | 2023-01-31 00:00:00 | 78.360000 |
| 75% | 2024-05-01 12:00:00 | 85.885000 |
| max | 2025-07-31 00:00:00 | 127.980000 |
| std | NaN | 16.100662 |
In [9]:
# Plot Data
plt.figure(figsize=(20, 6))
df.plot(x="Date", y='Price', kind='line', color='purple')
plt.title('Brent Oil Indeks Price')
plt.xlabel('Date')
plt.ylabel('Price')
plt.grid(True)
plt.show()
<Figure size 2000x600 with 0 Axes>
Step 5: Build Model CLARA-FTSMC¶
5.1. Persiapan Data¶
In [10]:
# 1. DATA SPLITTING (80% Training, 20% Testing)
split_ratio = 0.8
split_index = int(len(df) * split_ratio)
df_train = df.iloc[:split_index].copy().reset_index(drop=True)
df_test = df.iloc[split_index:].copy().reset_index(drop=True)
print(f"\n{'='*70}")
print("📊 DATA SPLITTING STRATEGY")
print(f"{'='*70}")
print(f"Total Data Points : {len(df)}")
print(f"Training Set (80%) : {len(df_train)} records")
print(f" └─ Date Range : {df_train['Date'].min()} to {df_train['Date'].max()}")
print(f"Testing Set (20%) : {len(df_test)} records")
print(f" └─ Date Range : {df_test['Date'].min()} to {df_test['Date'].max()}")
print(f"{'='*70}\n")
====================================================================== 📊 DATA SPLITTING STRATEGY ====================================================================== Total Data Points : 1291 Training Set (80%) : 1032 records └─ Date Range : 2020-08-03 00:00:00 to 2024-07-30 00:00:00 Testing Set (20%) : 259 records └─ Date Range : 2024-07-31 00:00:00 to 2025-07-31 00:00:00 ======================================================================
In [11]:
# 2. NORMALISASI (HANYA DARI TRAINING DATA)
features = ['Price']
X_train = df_train[features].values
X_test = df_test[features].values
# PENTING: Fit scaler HANYA pada training data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # Transform menggunakan parameter dari training
feature_ranges_training = [(X_train_scaled[:, i].min(), X_train_scaled[:, i].max())
for i in range(X_train_scaled.shape[1])]
feature_ranges_testing = [(X_test_scaled[:, i].min(), X_test_scaled[:, i].max())
for i in range(X_test_scaled.shape[1])]
print("✅ Normalization Complete (using training data statistics)")
print(f"\nTraining - Mean: {X_train_scaled[:, 0].mean():.10f}, Std: {X_train_scaled[:, 0].std():.4f}")
print(f"Feature ranges Training: {feature_ranges_training}\n")
print(f"\nTesting - Mean: {X_test_scaled[:, 0].mean():.10f}, Std: {X_test_scaled[:, 0].std():.4f}")
print(f"Feature ranges Testing: {feature_ranges_testing}\n")
✅ Normalization Complete (using training data statistics) Training - Mean: 0.0000000000, Std: 1.0000 Feature ranges Training: [(-2.4186652442853105, 2.7491944532213526)] Testing - Mean: -0.4307237261, Std: 0.2747 Feature ranges Testing: [(-1.1187076071639375, 0.14128646623389784)]
In [12]:
plt.figure(figsize=(10, 6))
# Plot Training data
plt.plot(df_train['Date'], df_train['Price'],
color='purple', label='Training Data')
# Plot Testing data
plt.plot(df_test['Date'], df_test['Price'],
color='orange', label='Testing Data')
# Garis vertikal batas split
split_date = df_test['Date'].iloc[0]
plt.axvline(split_date, color='black', linestyle='--', linewidth=2)
# Title dan dekorasi
plt.title('Brent Oil Indeks Price (Training vs Testing)')
plt.xlabel('Date')
plt.ylabel('Price')
plt.grid(True)
plt.legend()
plt.show()
5.2. Clustering Large Application (Using Gap Statistic for Optimal Cluster)¶
5.2.1 Input Parameter Awal¶
In [13]:
# Parameter Gap Statistic
k_max = 20
B = 100
sample_size_base = 40
print(f"⚙️ Gap Statistics Parameters:")
print(f" k_max = {k_max}")
print(f" Bootstrap samples (B) = {B}")
⚙️ Gap Statistics Parameters: k_max = 20 Bootstrap samples (B) = 100
5.2.2 Hitung W_k Untuk Data Training¶
In [14]:
def calculate_wk_clara(X, k, sample_size, n_samples=5, random_state=42):
"""
Hitung Within-cluster Sum of Squares (W_k) menggunakan CLARA.
CLARA = Clustering LARge Applications (menggunakan sampling + PAM)
- Menjalankan PAM pada beberapa sample (default = 5)
- Memilih model dengan inertia terkecil.
"""
np.random.seed(random_state)
best_inertia = np.inf
best_model = None
for s in range(n_samples):
# Ambil random sample
sample_idx = np.random.choice(len(X), size=min(sample_size, len(X)), replace=False)
X_sample = X[sample_idx]
# Jalankan PAM (KMedoids)
kmedoids = KMedoids(
n_clusters=k,
method='pam',
init='k-medoids++',
max_iter=300,
random_state=random_state + s
)
kmedoids.fit(X_sample)
# Prediksi seluruh data (label berdasarkan medoid terdekat)
labels = kmedoids.predict(X)
# Hitung inertia (W_k) untuk seluruh data
inertia = 0
for i in range(k):
cluster_points = X[labels == i]
if len(cluster_points) > 0:
medoid = kmedoids.cluster_centers_[i]
inertia += np.sum((cluster_points - medoid) ** 2)
# Simpan hasil terbaik (inertia terkecil)
if inertia < best_inertia:
best_inertia = inertia
best_model = kmedoids
return best_inertia, best_model
In [15]:
wk_values = []
log_wk_values = []
clara_models = []
print("\n📈 Computing W_k for original data using CLARA:")
for k in range(1, k_max + 1):
sample_size = 40 + 2 * k # sesuai aturan CLARA
wk, clara_model = calculate_wk_clara(X_train_scaled, k=k, sample_size=sample_size, n_samples=5)
wk_values.append(wk)
log_wk_values.append(np.log(wk))
clara_models.append(clara_model)
print(f"k={k:2d} → W_k={wk:8.4f} → log(W_k)={np.log(wk):7.4f}")
📈 Computing W_k for original data using CLARA: k= 1 → W_k=1560.4995 → log(W_k)= 7.3528 k= 2 → W_k=466.8335 → log(W_k)= 6.1460 k= 3 → W_k=191.1700 → log(W_k)= 5.2532 k= 4 → W_k=105.3746 → log(W_k)= 4.6575 k= 5 → W_k= 72.7024 → log(W_k)= 4.2864 k= 6 → W_k= 47.5793 → log(W_k)= 3.8624 k= 7 → W_k= 44.4604 → log(W_k)= 3.7946 k= 8 → W_k= 27.3335 → log(W_k)= 3.3081 k= 9 → W_k= 29.0047 → log(W_k)= 3.3675 k=10 → W_k= 21.6057 → log(W_k)= 3.0730 k=11 → W_k= 18.0463 → log(W_k)= 2.8929 k=12 → W_k= 14.7932 → log(W_k)= 2.6942 k=13 → W_k= 11.9001 → log(W_k)= 2.4765 k=14 → W_k= 9.9076 → log(W_k)= 2.2933 k=15 → W_k= 9.3735 → log(W_k)= 2.2379 k=16 → W_k= 7.4367 → log(W_k)= 2.0064 k=17 → W_k= 6.8781 → log(W_k)= 1.9283 k=18 → W_k= 6.8364 → log(W_k)= 1.9223 k=19 → W_k= 6.2775 → log(W_k)= 1.8370 k=20 → W_k= 5.8529 → log(W_k)= 1.7669
5.2.3 Hitung W_kb, Sd,dan S_k untuk data referensi (Bootstarp)¶
In [16]:
def generate_reference_data(n_samples, feature_ranges):
"""Generate dataset referensi acak"""
n_features = len(feature_ranges)
X_ref = np.zeros((n_samples, n_features))
for i in range(n_features):
min_val, max_val = feature_ranges[i]
X_ref[:, i] = np.random.uniform(min_val, max_val, n_samples)
return X_ref
In [17]:
log_wk_star = np.zeros((k_max, B))
n_samples = X_train_scaled.shape[0]
print(f"\n🔄 Computing reference distributions (B={B} iterations):")
for k in range(1, k_max + 1):
print(f"-Processing k={k}... ", end="", flush=True)
for b in range(B):
X_ref = generate_reference_data(n_samples, feature_ranges_training)
wk_star, _ = calculate_wk_clara(X_ref, k=k, sample_size=sample_size)
log_wk_star[k-1, b] = np.log(wk_star)
print("✓")
🔄 Computing reference distributions (B=100 iterations): ✓Processing k=1... ✓Processing k=2... ✓Processing k=3... ✓Processing k=4... ✓Processing k=5... ✓Processing k=6... ✓Processing k=7... ✓Processing k=8... ✓Processing k=9... ✓Processing k=10... ✓Processing k=11... ✓Processing k=12... ✓Processing k=13... ✓Processing k=14... ✓Processing k=15... ✓Processing k=16... ✓Processing k=17... ✓Processing k=18... ✓Processing k=19... ✓Processing k=20...
In [18]:
# Hitung statistik dasar
E_log_wk_star = log_wk_star.mean(axis=1)
sd_k = log_wk_star.std(axis=1)
s_k = sd_k * np.sqrt(1 + 1/B)
# Hitung juga nilai rata-rata Wk* (bukan log)
E_wk_star = np.exp(E_log_wk_star)
# Buat DataFrame lengkap untuk statistik
gap_stats_df = pd.DataFrame({
'k': range(1, k_max + 1),
'W_k* (mean)': E_wk_star,
'log(W_k*) (mean)': E_log_wk_star,
'sd_k': sd_k,
's_k': s_k
})
print("\n" + "="*100)
print("📊 GAP STATISTIC SUMMARY (with W_k* and log(W_k*))")
print("="*100)
# Versi compact (dibulatkan agar mudah dibaca)
gap_stats_compact = pd.DataFrame({
'k': range(1, k_max + 1),
'W_k* (mean)': [f"{val:.4f}" for val in E_wk_star],
'log(W_k*) (mean)': [f"{val:.6f}" for val in E_log_wk_star],
'sd_k': [f"{val:.3e}" for val in sd_k],
's_k': [f"{val:.3e}" for val in s_k]
})
print(gap_stats_compact.to_string(index=False))
print("="*100)
==================================================================================================== 📊 GAP STATISTIC SUMMARY (with W_k* and log(W_k*)) ==================================================================================================== k W_k* (mean) log(W_k*) (mean) sd_k s_k 1 2255.6779 7.721206 8.882e-16 8.926e-16 2 569.5843 6.344907 8.882e-16 8.926e-16 3 289.7639 5.669066 1.776e-15 1.785e-15 4 151.0842 5.017837 0.000e+00 0.000e+00 5 100.8699 4.613831 0.000e+00 0.000e+00 6 71.3087 4.267019 1.776e-15 1.785e-15 7 53.7480 3.984306 4.441e-16 4.463e-16 8 36.2340 3.589998 8.882e-16 8.926e-16 9 30.7793 3.426842 1.332e-15 1.339e-15 10 27.7738 3.324095 4.441e-16 4.463e-16 11 22.3327 3.106053 1.332e-15 1.339e-15 12 17.5228 2.863502 8.882e-16 8.926e-16 13 14.4335 2.669554 8.882e-16 8.926e-16 14 13.3206 2.589308 0.000e+00 0.000e+00 15 11.3862 2.432405 0.000e+00 0.000e+00 16 10.4235 2.344059 8.882e-16 8.926e-16 17 9.3490 2.235266 8.882e-16 8.926e-16 18 7.9954 2.078871 0.000e+00 0.000e+00 19 7.3506 1.994780 2.220e-16 2.232e-16 20 7.0214 1.948959 2.220e-16 2.232e-16 ====================================================================================================
5.2.4 Hitung Gap Statistic¶
In [19]:
gap_k = E_log_wk_star - np.array(log_wk_values)
# Tampilkan tabel
results_df = pd.DataFrame({
'k': range(1, k_max + 1),
'log(W_k)': log_wk_values,
'E{log(W_k*)}': E_log_wk_star,
'Gap(k)': gap_k,
's_k': s_k
})
print("\n" + "=" * 60)
print("Gap Statistic Results (using CLARA):")
print("=" * 60)
print(results_df.to_string(index=False))
============================================================
Gap Statistic Results (using CLARA):
============================================================
k log(W_k) E{log(W_k*)} Gap(k) s_k
1 7.352761 7.721206 0.368445 8.926083e-16
2 6.145973 6.344907 0.198934 8.926083e-16
3 5.253163 5.669066 0.415903 1.785217e-15
4 4.657522 5.017837 0.360315 0.000000e+00
5 4.286374 4.613831 0.327457 0.000000e+00
6 3.862398 4.267019 0.404621 1.785217e-15
7 3.794600 3.984306 0.189706 4.463041e-16
8 3.308111 3.589998 0.281886 8.926083e-16
9 3.367458 3.426842 0.059384 1.338912e-15
10 3.072955 3.324095 0.251140 4.463041e-16
11 2.892942 3.106053 0.213111 1.338912e-15
12 2.694166 2.863502 0.169335 8.926083e-16
13 2.476544 2.669554 0.193010 8.926083e-16
14 2.293305 2.589308 0.296003 0.000000e+00
15 2.237888 2.432405 0.194517 0.000000e+00
16 2.006422 2.344059 0.337638 8.926083e-16
17 1.928341 2.235266 0.306925 8.926083e-16
18 1.922262 2.078871 0.156608 0.000000e+00
19 1.836973 1.994780 0.157807 2.231521e-16
20 1.766935 1.948959 0.182024 2.231521e-16
5.2.5 Tentukan k Optimal¶
In [20]:
k_optimal = None
print(f"\n{'k':<5} {'Gap(k)':<12} {'Gap(k+1)-s(k+1)':<18} {'Criterion':<12}")
print("-" * 50)
for k in range(1, k_max):
gap_current = gap_k[k-1]
gap_next_minus_s = gap_k[k] - s_k[k]
meets_criteria = gap_current >= gap_next_minus_s
print(f"{k:<5} {gap_current:<12.4f} {gap_next_minus_s:<18.4f} {'✓ YES' if meets_criteria else '✗ NO':<12}")
# Pilih k pertama yang memenuhi kriteria (tanpa batas minimal)
if meets_criteria and k_optimal is None:
k_optimal = k
# Jika tidak ada k yang memenuhi kriteria
if k_optimal is None:
# Pilih k dengan Gap tertinggi (tanpa batasan)
k_optimal = np.argmax(gap_k) + 1
print(f"\n⚠️ No k meets criterion. Selecting k = {k_optimal} with highest Gap.")
print(f"\n🎯 OPTIMAL k = {k_optimal} (determined using Gap Statistic)")
print(f" Gap({k_optimal}) = {gap_k[k_optimal-1]:.4f}")
k Gap(k) Gap(k+1)-s(k+1) Criterion -------------------------------------------------- 1 0.3684 0.1989 ✓ YES 2 0.1989 0.4159 ✗ NO 3 0.4159 0.3603 ✓ YES 4 0.3603 0.3275 ✓ YES 5 0.3275 0.4046 ✗ NO 6 0.4046 0.1897 ✓ YES 7 0.1897 0.2819 ✗ NO 8 0.2819 0.0594 ✓ YES 9 0.0594 0.2511 ✗ NO 10 0.2511 0.2131 ✓ YES 11 0.2131 0.1693 ✓ YES 12 0.1693 0.1930 ✗ NO 13 0.1930 0.2960 ✗ NO 14 0.2960 0.1945 ✓ YES 15 0.1945 0.3376 ✗ NO 16 0.3376 0.3069 ✓ YES 17 0.3069 0.1566 ✓ YES 18 0.1566 0.1578 ✗ NO 19 0.1578 0.1820 ✗ NO 🎯 OPTIMAL k = 1 (determined using Gap Statistic) Gap(1) = 0.3684
In [21]:
plt.figure(figsize=(10, 6))
k_range = range(1, k_max)
gap_current = gap_k[:-1]
gap_next_minus_s = gap_k[1:] - s_k[1:]
plt.plot(k_range, gap_current, 'o-', linewidth=2, markersize=8, color='#4A4E69', label='Gap(k)')
plt.plot(k_range, gap_next_minus_s, 's-', linewidth=2, markersize=8, color='#F2CC8F', label='Gap(k+1) - s(k+1)')
for k in k_range:
if gap_current[k-1] >= gap_next_minus_s[k-1]:
plt.plot(k, gap_current[k-1], 'g*', markersize=15)
if k == k_optimal:
plt.annotate(f'k={k}\n(Optimal)', xy=(k, gap_current[k-1]),
xytext=(k+1, gap_current[k-1]+0.02),
fontsize=10, fontweight='bold', color='green',
bbox=dict(boxstyle='round,pad=0.3', facecolor='lightgreen', alpha=0.7),
arrowprops=dict(arrowstyle='->', color='green', lw=1.5))
plt.axvline(x=k_optimal, color='red', linestyle='--', linewidth=2, alpha=0.5)
plt.xlabel('Number of Clusters (k)', fontsize=12, fontweight='bold')
plt.ylabel('Gap Value', fontsize=12, fontweight='bold')
plt.title('Gap Statistic Criterion Check', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [22]:
# ================================
# Step 1: Gap Statistic memberikan kandidat
# ================================
valid_k_candidates = [3,4,6,8,10,11,14,16,17]
# ================================
# Step 2: Filter dengan constraint praktis
# ================================
min_k, max_k = 2, 20
filtered_k = [k for k in valid_k_candidates if min_k <= k <= max_k]
print(f"📋 Filtered k candidates: {filtered_k}")
print(f" Total: {len(filtered_k)} values\n")
# ================================
# Step 3: Evaluasi dengan multiple metrics menggunakan CLARA
# ================================
results = []
print("🔄 Evaluating metrics for each k using CLARA...")
print("-" * 60)
for k in filtered_k:
print(f"-Processing k={k}... ", end="", flush=True)
# Gunakan CLARA (KMedoids dengan sampling untuk data besar)
if len(X_train_scaled) > sample_size and k < len(X_train_scaled):
np.random.seed(42)
sample_idx = np.random.choice(len(X_train_scaled), size=sample_size, replace=False)
X_sample = X_train_scaled[sample_idx]
clara = KMedoids(n_clusters=k, method='pam', init='k-medoids++',
max_iter=300, random_state=42)
clara.fit(X_sample)
labels = clara.predict(X_train_scaled)
else:
clara = KMedoids(n_clusters=k, method='pam', init='k-medoids++',
max_iter=300, random_state=42)
labels = clara.fit_predict(X_scaled)
# Hitung metrics (hanya jika k > 1 untuk silhouette)
if k > 1:
sil_score = silhouette_score(X_train_scaled, labels)
db_score = davies_bouldin_score(X_train_scaled, labels)
else:
sil_score = np.nan
db_score = np.nan
results.append({
'k': k,
'gap': gap_k[k-1],
'silhouette': sil_score,
'davies_bouldin': db_score
})
print("✓")
# Ubah ke DataFrame untuk tampilan rapi
df_metrics = pd.DataFrame(results)
print("\n✅ Processing Complete!")
📋 Filtered k candidates: [3, 4, 6, 8, 10, 11, 14, 16, 17] Total: 9 values 🔄 Evaluating metrics for each k using CLARA... ------------------------------------------------------------ ✓Processing k=3... ✓Processing k=4... ✓Processing k=6... ✓Processing k=8... ✓Processing k=10... ✓Processing k=11... ✓Processing k=14... ✓Processing k=16... ✓Processing k=17... ✅ Processing Complete!
In [23]:
# ================================
# Gap Statistic Table
# ================================
print("\n" + "=" * 60)
print("📊 GAP STATISTIC VALUES")
print("=" * 60)
gap_table = df_metrics[['k', 'gap']].copy()
gap_table = gap_table.sort_values('k')
print(gap_table.to_string(index=False, float_format='%.4f'))
best_gap_k = df_metrics.loc[df_metrics['gap'].idxmax(), 'k']
max_gap = df_metrics['gap'].max()
print(f"\n🔹 Highest Gap Statistic: k = {best_gap_k} (Gap = {max_gap:.4f})")
print("=" * 60)
============================================================ 📊 GAP STATISTIC VALUES ============================================================ k gap 3 0.4159 4 0.3603 6 0.4046 8 0.2819 10 0.2511 11 0.2131 14 0.2960 16 0.3376 17 0.3069 🔹 Highest Gap Statistic: k = 3 (Gap = 0.4159) ============================================================
In [24]:
# ================================
# Gap Statistic Visualization
# ================================
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.plot(df_metrics['k'], df_metrics['gap'], 'bo-', linewidth=2, markersize=8)
plt.axvline(x=best_gap_k, color='r', linestyle='--', label=f'Best k = {best_gap_k}')
plt.xlabel('Number of Clusters (k)', fontsize=12)
plt.ylabel('Gap Statistic', fontsize=12)
plt.title('Gap Statistic vs Number of Clusters', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
In [25]:
# ================================
# Silhouette Score Table
# ================================
print("\n" + "=" * 60)
print("📊 SILHOUETTE SCORE VALUES")
print("=" * 60)
df_valid = df_metrics[df_metrics['k'] > 1].copy() # Exclude k=1
sil_table = df_valid[['k', 'silhouette']].copy()
sil_table = sil_table.sort_values('k')
print(sil_table.to_string(index=False, float_format='%.4f'))
best_sil_k = df_valid.loc[df_valid['silhouette'].idxmax(), 'k']
max_sil = df_valid['silhouette'].max()
print(f"\n🔹 Highest Silhouette Score: k = {best_sil_k} (Silhouette = {max_sil:.4f})")
print("\n📖 Interpretation:")
print(" > 0.7 = Strong structure")
print(" 0.5-0.7 = Reasonable structure")
print(" < 0.5 = Weak structure")
print("=" * 60)
============================================================ 📊 SILHOUETTE SCORE VALUES ============================================================ k silhouette 3 0.4749 4 0.5775 6 0.5566 8 0.5409 10 0.5489 11 0.5443 14 0.5271 16 0.5095 17 0.5126 🔹 Highest Silhouette Score: k = 4 (Silhouette = 0.5775) 📖 Interpretation: > 0.7 = Strong structure 0.5-0.7 = Reasonable structure < 0.5 = Weak structure ============================================================
In [26]:
# ================================
# Silhouette Score Visualization
# ================================
plt.figure(figsize=(10, 6))
plt.plot(df_valid['k'], df_valid['silhouette'], 'go-', linewidth=2, markersize=8)
plt.axvline(x=best_sil_k, color='r', linestyle='--', label=f'Best k = {best_sil_k}')
plt.axhline(y=0.7, color='g', linestyle=':', alpha=0.5, label='Strong (>0.7)')
plt.axhline(y=0.5, color='orange', linestyle=':', alpha=0.5, label='Reasonable (>0.5)')
plt.xlabel('Number of Clusters (k)', fontsize=12)
plt.ylabel('Silhouette Score', fontsize=12)
plt.title('Silhouette Score vs Number of Clusters', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
In [27]:
# ================================
# Davies-Bouldin Index Table
# ================================
print("\n" + "=" * 60)
print("📊 DAVIES-BOULDIN INDEX VALUES")
print("=" * 60)
db_table = df_valid[['k', 'davies_bouldin']].copy()
db_table = db_table.sort_values('k')
print(db_table.to_string(index=False, float_format='%.4f'))
best_db_k = df_valid.loc[df_valid['davies_bouldin'].idxmin(), 'k']
min_db = df_valid['davies_bouldin'].min()
print(f"\n🔹 Lowest Davies-Bouldin Index: k = {best_db_k} (DB = {min_db:.4f})")
print("\n📖 Interpretation:")
print(" Lower values indicate better cluster separation")
print("=" * 60)
============================================================ 📊 DAVIES-BOULDIN INDEX VALUES ============================================================ k davies_bouldin 3 0.5751 4 0.4875 6 0.4882 8 0.5088 10 0.5120 11 0.4994 14 0.5287 16 0.5253 17 0.5301 🔹 Lowest Davies-Bouldin Index: k = 4 (DB = 0.4875) 📖 Interpretation: Lower values indicate better cluster separation ============================================================
In [28]:
# ================================
# Davies-Bouldin Index Visualization
# ================================
plt.figure(figsize=(10, 6))
plt.plot(df_valid['k'], df_valid['davies_bouldin'], 'ro-', linewidth=2, markersize=8)
plt.axvline(x=best_db_k, color='darkred', linestyle='--', label=f'Best k = {best_db_k}')
plt.xlabel('Number of Clusters (k)', fontsize=12)
plt.ylabel('Davies-Bouldin Index', fontsize=12)
plt.title('Davies-Bouldin Index vs Number of Clusters', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
In [29]:
# ================================
# Best k per Metric & Voting
# ================================
print("\n" + "=" * 60)
print("🎯 BEST k PER METRIC (CLARA METHOD)")
print("=" * 60)
print(f"🔹 Highest Gap Statistic : k = {best_gap_k:2d} (Gap = {df_metrics['gap'].max():.4f})")
print(f"🔹 Highest Silhouette Score : k = {best_sil_k:2d} (Silhouette = {df_valid['silhouette'].max():.4f})")
print(f"🔹 Lowest Davies-Bouldin Index : k = {best_db_k:2d} (DB = {df_valid['davies_bouldin'].min():.4f})")
print("=" * 60)
# ================================
# Voting System
# ================================
votes = {k: 0 for k in filtered_k}
# Berikan vote
votes[best_gap_k] += 1
votes[best_sil_k] += 1
votes[best_db_k] += 1
# Cari k dengan vote terbanyak
max_votes = max(votes.values())
recommended_k = [k for k, v in votes.items() if v == max_votes]
print(f"\n🗳️ VOTING RESULTS:")
for k in sorted(votes.keys()):
print(f" k={k:2d} : {'⭐' * votes[k]} ({votes[k]} votes)")
print(f"\n🏆 VOTING WINNER: k = {recommended_k[0] if len(recommended_k) == 1 else recommended_k}")
if len(recommended_k) > 1:
print(f" (Multiple k tied with {max_votes} votes)")
print("=" * 60)
============================================================ 🎯 BEST k PER METRIC (CLARA METHOD) ============================================================ 🔹 Highest Gap Statistic : k = 3 (Gap = 0.4159) 🔹 Highest Silhouette Score : k = 4 (Silhouette = 0.5775) 🔹 Lowest Davies-Bouldin Index : k = 4 (DB = 0.4875) ============================================================ 🗳️ VOTING RESULTS: k= 3 : ⭐ (1 votes) k= 4 : ⭐⭐ (2 votes) k= 6 : (0 votes) k= 8 : (0 votes) k=10 : (0 votes) k=11 : (0 votes) k=14 : (0 votes) k=16 : (0 votes) k=17 : (0 votes) 🏆 VOTING WINNER: k = 4 ============================================================
In [30]:
# ================================
# Aggregate Ranking (Normalized Scores)
# ================================
print("\n" + "=" * 60)
print("🔢 AGGREGATE RANKING (Normalized Scores)")
print("=" * 60)
# Normalize semua metrik ke range [0, 1]
df_rank = df_valid.copy()
df_rank['gap_norm'] = (df_rank['gap'] - df_rank['gap'].min()) / (df_rank['gap'].max() - df_rank['gap'].min())
df_rank['sil_norm'] = (df_rank['silhouette'] - df_rank['silhouette'].min()) / (df_rank['silhouette'].max() - df_rank['silhouette'].min())
df_rank['db_norm'] = 1 - (df_rank['davies_bouldin'] - df_rank['davies_bouldin'].min()) / (df_rank['davies_bouldin'].max() - df_rank['davies_bouldin'].min())
# Aggregate score dari 3 metrik
df_rank['aggregate_score'] = (df_rank['gap_norm'] + df_rank['sil_norm'] + df_rank['db_norm']) / 3
# Urutkan berdasarkan skor total
df_rank = df_rank.sort_values('aggregate_score', ascending=False)
print(df_rank[['k', 'gap_norm', 'sil_norm', 'db_norm', 'aggregate_score']].to_string(index=False, float_format='%.4f'))
best_aggregate_k = df_rank.iloc[0]['k']
print(f"\n🥇 BEST k BY AGGREGATE SCORE: k = {int(best_aggregate_k)} (Score: {df_rank.iloc[0]['aggregate_score']:.4f})")
# ================================
# Final Recommendation
# ================================
print("\n" + "=" * 60)
print("💡 FINAL RECOMMENDATION")
print("=" * 60)
print(f"Method: CLARA (Clustering Large Applications)")
print(f"Sample Size: {sample_size}")
print(f"\nBased on comprehensive evaluation:")
print(f" • Voting Winner: k = {recommended_k[0] if len(recommended_k) == 1 else recommended_k}")
print(f" • Aggregate Score Winner: k = {int(best_aggregate_k)}")
print(f" • Gap Statistic Optimal: k = {k_optimal}")
print(f"\n✅ RECOMMENDED k = {int(best_aggregate_k)}")
print("=" * 60)
============================================================ 🔢 AGGREGATE RANKING (Normalized Scores) ============================================================ k gap_norm sil_norm db_norm aggregate_score 6 0.9444 0.7968 0.9922 0.9111 4 0.7259 1.0000 1.0000 0.9086 8 0.3391 0.6435 0.7577 0.5801 10 0.1875 0.7212 0.7209 0.5432 11 0.0000 0.6769 0.8651 0.5140 16 0.6141 0.3369 0.5694 0.5068 14 0.4088 0.5092 0.5306 0.4829 17 0.4626 0.3677 0.5146 0.4483 3 1.0000 0.0000 0.0000 0.3333 🥇 BEST k BY AGGREGATE SCORE: k = 6 (Score: 0.9111) ============================================================ 💡 FINAL RECOMMENDATION ============================================================ Method: CLARA (Clustering Large Applications) Sample Size: 80 Based on comprehensive evaluation: • Voting Winner: k = 4 • Aggregate Score Winner: k = 6 • Gap Statistic Optimal: k = 1 ✅ RECOMMENDED k = 6 ============================================================
5.2.6. Visualisasi Lainnya¶
In [31]:
k_optimal = 6
plt.style.use('default')
sns.set_palette("husl")
# =====================================================================
# STEP 1: Fit CLARA dengan k optimal
# =====================================================================
print(f"\n🎯 Fitting CLARA with optimal k = {k_optimal}")
if len(X_train_scaled) > sample_size and k_optimal < len(X_train_scaled):
np.random.seed(42)
sample_idx = np.random.choice(len(X_train_scaled), size=sample_size, replace=False)
X_sample = X_train_scaled[sample_idx]
clara_final = KMedoids(n_clusters=k_optimal, method='pam', init='k-medoids++',
max_iter=300, random_state=42)
clara_final.fit(X_sample)
labels_final = clara_final.predict(X_train_scaled)
else:
clara_final = KMedoids(n_clusters=k_optimal, method='pam', init='k-medoids++',
max_iter=300, random_state=42)
labels_final = clara_final.fit_predict(X_train_scaled)
medoids_scaled = clara_final.cluster_centers_
medoids_original = scaler.inverse_transform(medoids_scaled)
# =====================================================================
# Urutkan Cluster Berdasarkan Medoid
# =====================================================================
medoid_values = medoids_original[:, 0]
sorted_indices = np.argsort(medoid_values)
cluster_mapping = {old_idx: new_idx for new_idx, old_idx in enumerate(sorted_indices)}
labels_final_sorted = np.array([cluster_mapping[label] for label in labels_final])
medoids_scaled_sorted = medoids_scaled[sorted_indices]
medoids_original_sorted = medoids_original[sorted_indices]
df_train['Cluster'] = labels_final_sorted
df_train['Price_Scaled'] = X_train_scaled[:, 0]
sil_score = silhouette_score(X_train_scaled, labels_final_sorted)
print(f"✓ Clustering complete!")
print(f" Silhouette Score: {sil_score:.4f}")
colors = plt.cm.tab20(np.linspace(0, 1, k_optimal))
🎯 Fitting CLARA with optimal k = 6 ✓ Clustering complete! Silhouette Score: 0.5566
In [32]:
# =====================================================================
# PLOT 1: ELBOW METHOD - W_k values
# =====================================================================
plt.figure(figsize=(10, 6))
plt.plot(range(1, k_max + 1), wk_values, 'o-', linewidth=2, markersize=8, color='#2E86AB')
plt.axvline(x=k_optimal, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Optimal k={k_optimal}')
plt.xlabel('Number of Clusters (k)', fontsize=12, fontweight='bold')
plt.ylabel('W_k values', fontsize=12, fontweight='bold')
plt.title('Elbow Method - W_k value (CLARA)', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [33]:
# =====================================================================
# PLOT 2: LOG(W_k) COMPARISON (Observed vs Expected)
# =====================================================================
plt.figure(figsize=(10, 6))
plt.plot(range(1, k_max + 1), log_wk_values, 'o-', linewidth=2, markersize=8,
color='#A23B72', label='log(W_k) - Observed')
plt.plot(range(1, k_max + 1), E_log_wk_star, 's-', linewidth=2, markersize=8,
color='#F18F01', label='E[log(W_k*)] - Expected')
plt.axvline(x=k_optimal, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Optimal k={k_optimal}')
plt.xlabel('Number of Clusters (k)', fontsize=12, fontweight='bold')
plt.ylabel('log(W_k)', fontsize=12, fontweight='bold')
plt.title('Observed vs Expected log(W_k)', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [34]:
# =====================================================================
# PLOT 3: GAP STATISTIC VALUES
# =====================================================================
plt.figure(figsize=(10, 6))
plt.plot(range(1, k_max + 1), gap_k, 'o-', linewidth=2.5, markersize=10, color='#06A77D')
plt.axvline(x=k_optimal, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Optimal k={k_optimal}')
optimal_gap = gap_k[k_optimal - 1]
plt.plot(k_optimal, optimal_gap, 'r*', markersize=20, label=f'Gap({k_optimal})={optimal_gap:.3f}')
plt.xlabel('Number of Clusters (k)', fontsize=12, fontweight='bold')
plt.ylabel('Gap Statistic', fontsize=12, fontweight='bold')
plt.title('Gap Statistic Values (CLARA)', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [35]:
# =====================================================================
# PLOT 4: GAP WITH ERROR BARS (s_k)
# =====================================================================
plt.figure(figsize=(10, 6))
plt.errorbar(range(1, k_max + 1), gap_k, yerr=s_k, fmt='o-', linewidth=2,
markersize=8, capsize=5, capthick=2, color='#C73E1D', ecolor='gray', alpha=0.8)
plt.axvline(x=k_optimal, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Optimal k={k_optimal}')
plt.xlabel('Number of Clusters (k)', fontsize=12, fontweight='bold')
plt.ylabel('Gap(k) ± s_k', fontsize=12, fontweight='bold')
plt.title('Gap Statistic with Standard Error', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [36]:
# =====================================================================
# PLOT 5: CRITERION EVALUATION
# =====================================================================
plt.figure(figsize=(10, 6))
k_range = range(1, k_max)
gap_current = gap_k[:-1]
gap_next_minus_s = gap_k[1:] - s_k[1:]
plt.plot(k_range, gap_current, 'o-', linewidth=2, markersize=8, color='#4A4E69', label='Gap(k)')
plt.plot(k_range, gap_next_minus_s, 's-', linewidth=2, markersize=8, color='#F2CC8F', label='Gap(k+1) - s(k+1)')
for k in k_range:
if gap_current[k-1] >= gap_next_minus_s[k-1]:
plt.plot(k, gap_current[k-1], 'g*', markersize=15)
if k == k_optimal:
plt.annotate(f'k={k}\n(Optimal)', xy=(k, gap_current[k-1]),
xytext=(k+1, gap_current[k-1]+0.02),
fontsize=10, fontweight='bold', color='green',
bbox=dict(boxstyle='round,pad=0.3', facecolor='lightgreen', alpha=0.7),
arrowprops=dict(arrowstyle='->', color='green', lw=1.5))
plt.axvline(x=k_optimal, color='red', linestyle='--', linewidth=2, alpha=0.5)
plt.xlabel('Number of Clusters (k)', fontsize=12, fontweight='bold')
plt.ylabel('Gap Value', fontsize=12, fontweight='bold')
plt.title('Gap Statistic Criterion Check', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [37]:
# =====================================================================
# PLOT 6: STANDARD ERROR (s_k)
# =====================================================================
plt.figure(figsize=(10, 6))
plt.plot(range(1, k_max + 1), s_k, 'o-', linewidth=2, markersize=8, color='#8338EC')
plt.axvline(x=k_optimal, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Optimal k={k_optimal}')
plt.xlabel('Number of Clusters (k)', fontsize=12, fontweight='bold')
plt.ylabel('Standard Error (s_k)', fontsize=12, fontweight='bold')
plt.title('Standard Error Distribution', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [38]:
# =====================================================================
# PLOT 7: RESULTS TABLE
# =====================================================================
fig, ax = plt.subplots(figsize=(14, 8))
ax.axis('off')
table_data = []
for k in range(1, min(k_max + 1, 16)):
criterion = '✓' if (k < k_max and gap_k[k-1] >= (gap_k[k] - s_k[k]) and k >= 3) else '✗'
table_data.append([
k,
f"{wk_values[k-1]:.2f}",
f"{log_wk_values[k-1]:.4f}",
f"{E_log_wk_star[k-1]:.4f}",
f"{gap_k[k-1]:.4f}",
f"{s_k[k-1]:.4f}",
criterion
])
table = ax.table(cellText=table_data,
colLabels=['k', 'W_k', 'log(W_k)', 'E[log(W_k*)]', 'Gap(k)', 's_k', 'Criterion'],
cellLoc='center',
loc='center',
bbox=[0, 0, 1, 1])
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 2)
for i in range(7):
table[(0, i)].set_facecolor('#34495e')
table[(0, i)].set_text_props(weight='bold', color='white')
for i in range(7):
table[(k_optimal, i)].set_facecolor('#90EE90')
table[(k_optimal, i)].set_text_props(weight='bold')
ax.set_title('Gap Statistic Summary Table (CLARA Method)',
fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()
In [39]:
# =====================================================================
# PLOT 8: TIME SERIES dengan CLUSTER COLORING
# =====================================================================
plt.figure(figsize=(16, 6))
for i in range(k_optimal):
cluster_data = df_train[df_train['Cluster'] == i]
plt.scatter(cluster_data['Date'], cluster_data['Price'],
label=f'Cluster {i} (${medoids_original_sorted[i, 0]:.2f})',
color=colors[i], alpha=0.6, s=30)
for i in range(k_optimal):
cluster_data = df_train[df_train['Cluster'] == i]
medoid_price = medoids_original_sorted[i, 0]
median_date = cluster_data['Date'].iloc[len(cluster_data)//2]
plt.scatter(median_date, medoid_price, color=colors[i],
s=300, marker='*', edgecolors='black', linewidths=2, zorder=5)
plt.xlabel('Date', fontsize=12, fontweight='bold')
plt.ylabel('Price ($)', fontsize=12, fontweight='bold')
plt.title(f'Time Series with CLARA Clustering (k={k_optimal}, Ordered by Medoid)',
fontsize=14, fontweight='bold')
plt.legend(loc='upper right', fontsize=10, frameon=True)
plt.grid(True, alpha=0.3, linestyle='--')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In [40]:
# =====================================================================
# PLOT 9: PRICE DISTRIBUTION per CLUSTER (Box Plot)
# =====================================================================
plt.figure(figsize=(10, 6))
cluster_data_list = [df_train[df_train['Cluster'] == i]['Price'].values for i in range(k_optimal)]
bp = plt.boxplot(cluster_data_list, labels=[f'C{i}' for i in range(k_optimal)],
patch_artist=True, widths=0.6)
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
patch.set_alpha(0.7)
for i in range(k_optimal):
plt.scatter(i+1, medoids_original_sorted[i, 0], color='red', s=150,
marker='D', edgecolors='black', linewidths=2, zorder=5, label='Medoid' if i==0 else '')
plt.xlabel('Cluster', fontsize=12, fontweight='bold')
plt.ylabel('Price ($)', fontsize=12, fontweight='bold')
plt.title('Price Distribution per Cluster (Ordered)', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [41]:
# =====================================================================
# PLOT 10: HISTOGRAM STACKED
# =====================================================================
plt.figure(figsize=(10, 6))
for i in range(k_optimal):
cluster_prices = df_train[df_train['Cluster'] == i]['Price']
plt.hist(cluster_prices, bins=30, alpha=0.6, color=colors[i],
label=f'C{i} (${medoids_original_sorted[i, 0]:.0f})',
edgecolor='black', linewidth=0.5)
plt.xlabel('Price ($)', fontsize=12, fontweight='bold')
plt.ylabel('Frequency', fontsize=12, fontweight='bold')
plt.title('Price Histogram by Cluster', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(axis='y', alpha=0.3, linestyle='--')
plt.tight_layout()
plt.show()
In [42]:
# =====================================================================
# PLOT 11: CLUSTER SIZE (Pie Chart)
# =====================================================================
plt.figure(figsize=(8, 8))
cluster_sizes = [len(df_train[df_train['Cluster'] == i]) for i in range(k_optimal)]
wedges, texts, autotexts = plt.pie(cluster_sizes,
labels=[f'C{i}\n${medoids_original_sorted[i, 0]:.0f}'
for i in range(k_optimal)],
colors=colors, autopct='%1.1f%%', startangle=90)
for autotext in autotexts:
autotext.set_color('white')
autotext.set_fontweight('bold')
autotext.set_fontsize(10)
plt.title('Cluster Size Distribution', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
In [43]:
# =====================================================================
# PLOT 12: SCALED DATA VISUALIZATION (1D)
# =====================================================================
plt.figure(figsize=(10, 6))
for i in range(k_optimal):
cluster_data = df_train[df_train['Cluster'] == i]['Price_Scaled']
plt.scatter([i]*len(cluster_data), cluster_data, color=colors[i],
alpha=0.5, s=20)
plt.scatter(range(k_optimal), medoids_scaled_sorted[:, 0], color='red', s=200,
marker='D', edgecolors='black', linewidths=2, zorder=5, label='Medoids')
plt.xlabel('Cluster', fontsize=12, fontweight='bold')
plt.ylabel('Scaled Price', fontsize=12, fontweight='bold')
plt.title('Scaled Data Distribution (Ordered)', fontsize=14, fontweight='bold')
plt.xticks(range(k_optimal), [f'C{i}' for i in range(k_optimal)])
plt.grid(axis='y', alpha=0.3, linestyle='--')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()
In [44]:
# =====================================================================
# PLOT 13: CLUSTER STATISTICS TABLE
# =====================================================================
fig, ax = plt.subplots(figsize=(14, 6))
ax.axis('off')
stats_data = []
for i in range(k_optimal):
cluster_prices = df_train[df_train['Cluster'] == i]['Price']
stats_data.append([
f'Cluster {i}',
len(cluster_prices),
f"${cluster_prices.min():.2f}",
f"${cluster_prices.max():.2f}",
f"${cluster_prices.mean():.2f}",
f"${cluster_prices.median():.2f}",
f"${cluster_prices.std():.2f}",
f"${medoids_original_sorted[i, 0]:.2f}"
])
table = ax.table(cellText=stats_data,
colLabels=['Cluster', 'Size', 'Min', 'Max', 'Mean', 'Median', 'Std', 'Medoid'],
cellLoc='center',
loc='center',
bbox=[0, 0, 1, 1])
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 2.5)
for i in range(8):
table[(0, i)].set_facecolor('#34495e')
table[(0, i)].set_text_props(weight='bold', color='white')
for i in range(k_optimal):
for j in range(8):
table[(i+1, j)].set_facecolor(colors[i])
table[(i+1, j)].set_alpha(0.3)
ax.set_title('Cluster Statistics Summary (Ordered by Medoid)', fontsize=15, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()
In [45]:
# =====================================================================
# PRINT SUMMARY
# =====================================================================
print("\n" + "="*70)
print("📊 CLUSTERING SUMMARY (Ordered by Medoid Value)")
print("="*70)
print(f"Method: CLARA (Clustering Large Applications)")
print(f"Optimal k: {k_optimal}")
print(f"Silhouette Score: {sil_score:.4f}")
print(f"Total data points: {len(df)}")
print(f"\nCluster Statistics (Ordered from lowest to highest medoid):")
print("-"*70)
for i in range(k_optimal):
cluster_data = df_train[df_train['Cluster'] == i]
print(f"Cluster {i}:")
print(f" Medoid: ${medoids_original_sorted[i, 0]:.2f} ⭐")
print(f" Size: {len(cluster_data)} ({len(cluster_data)/len(df_train)*100:.1f}%)")
print(f" Price Range: ${cluster_data['Price'].min():.2f} - ${cluster_data['Price'].max():.2f}")
print(f" Mean: ${cluster_data['Price'].mean():.2f}")
print()
print("="*70)
====================================================================== 📊 CLUSTERING SUMMARY (Ordered by Medoid Value) ====================================================================== Method: CLARA (Clustering Large Applications) Optimal k: 6 Silhouette Score: 0.5566 Total data points: 1291 Cluster Statistics (Ordered from lowest to highest medoid): ---------------------------------------------------------------------- Cluster 0: Medoid: $42.40 ⭐ Size: 81 (7.8%) Price Range: $37.46 - $46.06 Mean: $42.76 Cluster 1: Medoid: $51.80 ⭐ Size: 71 (6.9%) Price Range: $47.42 - $63.54 Mean: $55.32 Cluster 2: Medoid: $75.29 ⭐ Size: 307 (29.7%) Price Range: $63.67 - $79.70 Mean: $73.67 Cluster 3: Medoid: $84.18 ⭐ Size: 334 (32.4%) Price Range: $79.77 - $89.42 Mean: $84.20 Cluster 4: Medoid: $94.67 ⭐ Size: 141 (13.7%) Price Range: $89.47 - $102.32 Mean: $94.21 Cluster 5: Medoid: $110.23 ⭐ Size: 98 (9.5%) Price Range: $102.46 - $127.98 Mean: $111.89 ======================================================================
5.3 CLARA-FTSMC¶
5.3.1. Fungsi Keanggotaan (Trapezoid using IQR)¶
In [46]:
print("\n" + "-"*110)
print("Alasan Penggunaan Fungsi Keanggotaan Trapezoid")
print("-"*110)
print("✅ Sesuai dengan hasil CLARA: Setiap cluster sudah memiliki batas yang jelas dan tidak overlap")
print("✅ Ada plateau (μ = 1.0): Area dengan keanggotaan penuh di sekitar medoid")
print("✅ Lebih stabil: Tidak sensitif terhadap noise di tengah cluster")
print("✅ Interpretasi bisnis jelas: Harga di sekitar medoid = 'fully belongs' to that state")
-------------------------------------------------------------------------------------------------------------- Alasan Penggunaan Fungsi Keanggotaan Trapezoid -------------------------------------------------------------------------------------------------------------- ✅ Sesuai dengan hasil CLARA: Setiap cluster sudah memiliki batas yang jelas dan tidak overlap ✅ Ada plateau (μ = 1.0): Area dengan keanggotaan penuh di sekitar medoid ✅ Lebih stabil: Tidak sensitif terhadap noise di tengah cluster ✅ Interpretasi bisnis jelas: Harga di sekitar medoid = 'fully belongs' to that state
In [47]:
print("\n📊 5.3.1: Calculating Quartiles (Q1, Q3) and IQR...")
cluster_stats = []
for i in range(k_optimal):
# Get cluster data (original scale)
cluster_data = df_train[df_train['Cluster'] == i]['Price'].values
# Calculate quartiles and statistics
Q1 = np.percentile(cluster_data, 25)
Q2 = np.percentile(cluster_data, 50) # Median
Q3 = np.percentile(cluster_data, 75)
IQR = Q3 - Q1
# Store statistics
cluster_stats.append({
'cluster': i,
'size': len(cluster_data),
'min': cluster_data.min(),
'max': cluster_data.max(),
'Q1': Q1,
'Q2_median': Q2,
'Q3': Q3,
'IQR': IQR,
'medoid': medoids_original_sorted[i, 0],
'mean': cluster_data.mean(),
'std': cluster_data.std()
})
# Convert to DataFrame for nice display
df_cluster_stats = pd.DataFrame(cluster_stats)
print("\n✅ Quartile Calculation Complete!")
print("\n" + "-"*110)
print("CLUSTER STATISTICS (Q1, Q2, Q3, IQR)")
print("-"*110)
print(df_cluster_stats.to_string(index=False, float_format=lambda x: f'{x:.2f}'))
print("-"*110)
# Calculate plateau coverage
cluster_ranges = df_cluster_stats['max'] - df_cluster_stats['min']
plateau_coverage = (df_cluster_stats['IQR'] / cluster_ranges * 100)
print("\n📊 IQR Coverage of Cluster Range:")
for i in range(k_optimal):
print(f" Cluster {i}: IQR=${df_cluster_stats.iloc[i]['IQR']:.2f} "
f"= {plateau_coverage.iloc[i]:.1f}% of range "
f"(${cluster_ranges.iloc[i]:.2f})")
print(f"\n Average IQR Coverage: {plateau_coverage.mean():.1f}%")
print("-"*110)
📊 5.3.1: Calculating Quartiles (Q1, Q3) and IQR...
✅ Quartile Calculation Complete!
--------------------------------------------------------------------------------------------------------------
CLUSTER STATISTICS (Q1, Q2, Q3, IQR)
--------------------------------------------------------------------------------------------------------------
cluster size min max Q1 Q2_median Q3 IQR medoid mean std
0 81 37.46 46.06 41.29 42.85 44.43 3.14 42.40 42.76 2.06
1 71 47.42 63.54 50.88 55.66 60.67 9.79 51.80 55.32 5.21
2 307 63.67 79.70 70.61 74.42 77.09 6.48 75.29 73.67 4.30
3 334 79.77 89.42 82.43 84.05 85.90 3.47 84.18 84.20 2.32
4 141 89.47 102.32 91.41 93.54 96.49 5.08 94.67 94.21 3.33
5 98 102.46 127.98 107.04 111.59 115.59 8.55 110.23 111.89 5.93
--------------------------------------------------------------------------------------------------------------
📊 IQR Coverage of Cluster Range:
Cluster 0: IQR=$3.14 = 36.5% of range ($8.60)
Cluster 1: IQR=$9.79 = 60.7% of range ($16.12)
Cluster 2: IQR=$6.48 = 40.5% of range ($16.03)
Cluster 3: IQR=$3.47 = 35.9% of range ($9.65)
Cluster 4: IQR=$5.08 = 39.5% of range ($12.85)
Cluster 5: IQR=$8.55 = 33.5% of range ($25.52)
Average IQR Coverage: 41.1%
--------------------------------------------------------------------------------------------------------------
In [48]:
print("\n📐 5.3.2: Defining Trapezoid Parameters (Direct Q1-Q3)...")
fuzzy_sets = []
for i in range(k_optimal):
cluster_data = df_train[df_train['Cluster'] == i]['Price'].values
# Get quartiles
Q1 = cluster_stats[i]['Q1']
Q3 = cluster_stats[i]['Q3']
IQR = cluster_stats[i]['IQR']
medoid = cluster_stats[i]['medoid']
cluster_min = cluster_stats[i]['min']
cluster_max = cluster_stats[i]['max']
# Core region [b, c] - DIRECTLY USE Q1 and Q3 ✅
b = Q1 # 25th percentile - start of plateau
c = Q3 # 75th percentile - end of plateau
# Support region [a, d] - transition boundaries
if i == 0: # First cluster (leftmost)
# a: extend below minimum to capture lower tail
a = cluster_min - (cluster_max - cluster_min) * 0.1
# d: midpoint to next cluster's medoid
next_medoid = cluster_stats[i+1]['medoid']
d = (medoid + next_medoid) / 2
# Ensure d >= Q3 (no overlap with core)
if d < Q3:
d = Q3 + IQR * 0.3
elif i == k_optimal - 1: # Last cluster (rightmost)
# a: midpoint from previous cluster's medoid
prev_medoid = cluster_stats[i-1]['medoid']
a = (prev_medoid + medoid) / 2
# Ensure a <= Q1 (no overlap with core)
if a > Q1:
a = Q1 - IQR * 0.3
# d: extend above maximum to capture upper tail
d = cluster_max + (cluster_max - cluster_min) * 0.1
else: # Middle clusters
# a: midpoint from previous cluster's medoid
prev_medoid = cluster_stats[i-1]['medoid']
a = (prev_medoid + medoid) / 2
# Adjust if a > Q1 (overlaps with core)
if a > Q1:
a = Q1 - IQR * 0.3
# d: midpoint to next cluster's medoid
next_medoid = cluster_stats[i+1]['medoid']
d = (medoid + next_medoid) / 2
# Adjust if d < Q3 (overlaps with core)
if d < Q3:
d = Q3 + IQR * 0.3
fuzzy_sets.append({
'cluster': i,
'label': f'A{i}',
'a': a,
'b': b, # Q1
'c': c, # Q3
'd': d,
'medoid': medoid,
'Q1': Q1,
'Q3': Q3,
'IQR': IQR,
'plateau_width': c - b, # = IQR
'support_width': d - a
})
# Convert to DataFrame
df_fuzzy_sets = pd.DataFrame(fuzzy_sets)
print("\n✅ Trapezoid Parameters Defined (Q1-Q3 Based)!")
print("\n" + "-"*115)
print("FUZZY SET PARAMETERS (Trapezoid: [a, b=Q1, c=Q3, d])")
print("-"*115)
print(df_fuzzy_sets[['label', 'cluster', 'a', 'b', 'c', 'd', 'medoid', 'plateau_width', 'support_width']]
.to_string(index=False, float_format=lambda x: f'{x:.2f}'))
print("-"*115)
print("\n📖 INTERPRETATION:")
print(" • a: Lower support boundary (μ=0 starts)")
print(" • b: Q1 (25th percentile) - Plateau starts (μ=1 starts)")
print(" • c: Q3 (75th percentile) - Plateau ends (μ=1 ends)")
print(" • d: Upper support boundary (μ=0 ends)")
print(" • Plateau [b,c] = IQR = MIDDLE 50% of cluster data ✅")
print(" • Data between Q1 and Q3 has FULL membership (μ=1.0)")
print("-"*115)
# Validate trapezoid constraints
print("\n🔍 Validating Trapezoid Constraints:")
violations = []
for i in range(k_optimal):
fs = df_fuzzy_sets.iloc[i]
if not (fs['a'] < fs['b'] <= fs['c'] < fs['d']):
violations.append(f"Cluster {i}: {fs['a']:.2f} < {fs['b']:.2f} <= {fs['c']:.2f} < {fs['d']:.2f}")
if violations:
print(" ⚠️ Constraint violations found:")
for v in violations:
print(f" {v}")
else:
print(" ✅ All trapezoid constraints satisfied: a < b ≤ c < d")
# Calculate plateau statistics
plateau_widths = df_fuzzy_sets['plateau_width'].values
support_widths = df_fuzzy_sets['support_width'].values
print(f"\n📊 Plateau Width Statistics:")
print(f" Mean Plateau: ${plateau_widths.mean():.2f}")
print(f" Std Plateau: ${plateau_widths.std():.2f}")
print(f" Min Plateau: ${plateau_widths.min():.2f} (Cluster {plateau_widths.argmin()})")
print(f" Max Plateau: ${plateau_widths.max():.2f} (Cluster {plateau_widths.argmax()})")
print(f"\n📏 Support Width Statistics:")
print(f" Mean Support: ${support_widths.mean():.2f}")
print(f" Std Support: ${support_widths.std():.2f}")
📐 5.3.2: Defining Trapezoid Parameters (Direct Q1-Q3)... ✅ Trapezoid Parameters Defined (Q1-Q3 Based)! ------------------------------------------------------------------------------------------------------------------- FUZZY SET PARAMETERS (Trapezoid: [a, b=Q1, c=Q3, d]) ------------------------------------------------------------------------------------------------------------------- label cluster a b c d medoid plateau_width support_width A0 0 36.60 41.29 44.43 47.10 42.40 3.14 10.50 A1 1 47.10 50.88 60.67 63.55 51.80 9.79 16.45 A2 2 63.55 70.61 77.09 79.74 75.29 6.48 16.19 A3 3 79.74 82.43 85.90 89.43 84.18 3.47 9.69 A4 4 89.43 91.41 96.49 102.45 94.67 5.08 13.02 A5 5 102.45 107.04 115.59 130.53 110.23 8.55 28.08 ------------------------------------------------------------------------------------------------------------------- 📖 INTERPRETATION: • a: Lower support boundary (μ=0 starts) • b: Q1 (25th percentile) - Plateau starts (μ=1 starts) • c: Q3 (75th percentile) - Plateau ends (μ=1 ends) • d: Upper support boundary (μ=0 ends) • Plateau [b,c] = IQR = MIDDLE 50% of cluster data ✅ • Data between Q1 and Q3 has FULL membership (μ=1.0) ------------------------------------------------------------------------------------------------------------------- 🔍 Validating Trapezoid Constraints: ✅ All trapezoid constraints satisfied: a < b ≤ c < d 📊 Plateau Width Statistics: Mean Plateau: $6.08 Std Plateau: $2.47 Min Plateau: $3.14 (Cluster 0) Max Plateau: $9.79 (Cluster 1) 📏 Support Width Statistics: Mean Support: $15.66 Std Support: $6.12
In [49]:
def trapezoid_membership(x, a, b, c, d):
"""
Trapezoid membership function dengan boundary handling yang BENAR
FIX: Ketika x tepat di boundary kanan (x == d), beri membership minimum
sehingga data di boundary masuk ke fuzzy set di bawah (sesuai keinginan)
Parameters:
-----------
x : float or array
Input value(s) - Price
a, b, c, d : float
Trapezoid parameters where a < b <= c < d
- [a, b]: left slope (μ: 0→1)
- [b, c]: plateau (μ: 1)
- [c, d]: right slope (μ: 1→0)
Boundary Handling:
------------------
- Gunakan <= untuk batas kiri (x <= a → μ = 0)
- Gunakan <= untuk batas kanan (x <= d → μ > 0)
- Di boundary point (x == d), berikan membership minimum
- Titik perbatasan akan masuk ke fuzzy set di BAWAH
Returns:
--------
mu : float or array
Membership degree [0, 1]
"""
MIN_MEMBERSHIP = 0.0001 # Membership minimum untuk boundary points
if isinstance(x, (int, float)):
if x <= a:
return 0.0
elif x > d:
return 0.0
elif a < x < b:
return (x - a) / (b - a) if (b - a) > 0 else 1.0
elif b <= x <= c:
return 1.0
elif c < x <= d:
if (d - c) == 0:
return MIN_MEMBERSHIP
slope_value = (d - x) / (d - c)
# KUNCI: Ensure minimum membership at boundary
return max(slope_value, MIN_MEMBERSHIP)
else:
return 0.0
else:
# Array input
mu = np.zeros_like(x, dtype=float)
# Left slope: a < x < b
mask1 = (x > a) & (x < b)
if np.any(mask1) and (b - a) > 0:
mu[mask1] = (x[mask1] - a) / (b - a)
# Plateau: b <= x <= c (Q1 to Q3)
mask2 = (x >= b) & (x <= c)
mu[mask2] = 1.0
# Right slope: c < x <= d
mask3 = (x > c) & (x <= d)
if np.any(mask3) and (d - c) > 0:
mu[mask3] = (d - x[mask3]) / (d - c)
# KUNCI: Ensure minimum membership at boundary points
mu[mask3] = np.maximum(mu[mask3], MIN_MEMBERSHIP)
return mu
print("✅ Trapezoid Membership Function Defined (BOUNDARY FIX VERSION)!")
print("\n Formula:")
print(" ⎧ 0 if x ≤ a or x > d")
print(" ⎪ (x-a)/(b-a) if a < x < b")
print(" μ(x) = ⎨ 1 if b ≤ x ≤ c ← PLATEAU (Q1 to Q3)")
print(" ⎪ max((d-x)/(d-c), ε) if c < x ≤ d ← FIX: min membership at boundary")
print(" ⎩ 0 otherwise")
print("\n 📌 Boundary Rule: Titik batas masuk ke fuzzy set di BAWAH")
print(" Contoh: x=80.1 di batas A1|A2 → masuk A1 (bukan A2)")
✅ Trapezoid Membership Function Defined (BOUNDARY FIX VERSION)!
Formula:
⎧ 0 if x ≤ a or x > d
⎪ (x-a)/(b-a) if a < x < b
μ(x) = ⎨ 1 if b ≤ x ≤ c ← PLATEAU (Q1 to Q3)
⎪ max((d-x)/(d-c), ε) if c < x ≤ d ← FIX: min membership at boundary
⎩ 0 otherwise
📌 Boundary Rule: Titik batas masuk ke fuzzy set di BAWAH
Contoh: x=80.1 di batas A1|A2 → masuk A1 (bukan A2)
In [50]:
print("\n📊 5.3.4: Calculating Membership Degrees for Training Data...")
# Create membership matrix (n_samples × k_clusters)
n_train = len(df_train)
membership_matrix = np.zeros((n_train, k_optimal))
for i in range(k_optimal):
params = df_fuzzy_sets.iloc[i]
membership_matrix[:, i] = trapezoid_membership(
df_train['Price'].values,
params['a'], params['b'], params['c'], params['d']
)
# Add membership columns to dataframe
for i in range(k_optimal):
df_train[f'mu_A{i}'] = membership_matrix[:, i]
# Calculate sum of memberships
df_train['mu_sum'] = membership_matrix.sum(axis=1)
# Find dominant fuzzy set (max membership)
df_train['max_mu'] = membership_matrix.max(axis=1)
df_train['fuzzy_state'] = membership_matrix.argmax(axis=1)
print(f"✅ Membership calculation complete!")
print(f"\n📋 Sample Membership Degrees (first 15 rows):")
sample_cols = ['Date', 'Price', 'Cluster'] + [f'mu_A{i}' for i in range(k_optimal)] + ['mu_sum']
print(df_train[sample_cols].head(15).to_string(index=False))
📊 5.3.4: Calculating Membership Degrees for Training Data...
✅ Membership calculation complete!
📋 Sample Membership Degrees (first 15 rows):
Date Price Cluster mu_A0 mu_A1 mu_A2 mu_A3 mu_A4 mu_A5 mu_sum
2020-08-03 44.15 0 1.000000 0.0 0.0 0.0 0.0 0.0 1.000000
2020-08-04 44.43 0 1.000000 0.0 0.0 0.0 0.0 0.0 1.000000
2020-08-05 45.17 0 0.722846 0.0 0.0 0.0 0.0 0.0 0.722846
2020-08-06 45.09 0 0.752809 0.0 0.0 0.0 0.0 0.0 0.752809
2020-08-07 44.40 0 1.000000 0.0 0.0 0.0 0.0 0.0 1.000000
2020-08-10 44.99 0 0.790262 0.0 0.0 0.0 0.0 0.0 0.790262
2020-08-11 44.50 0 0.973783 0.0 0.0 0.0 0.0 0.0 0.973783
2020-08-12 45.43 0 0.625468 0.0 0.0 0.0 0.0 0.0 0.625468
2020-08-13 44.96 0 0.801498 0.0 0.0 0.0 0.0 0.0 0.801498
2020-08-14 44.80 0 0.861423 0.0 0.0 0.0 0.0 0.0 0.861423
2020-08-17 45.37 0 0.647940 0.0 0.0 0.0 0.0 0.0 0.647940
2020-08-18 45.46 0 0.614232 0.0 0.0 0.0 0.0 0.0 0.614232
2020-08-19 45.37 0 0.647940 0.0 0.0 0.0 0.0 0.0 0.647940
2020-08-20 44.90 0 0.823970 0.0 0.0 0.0 0.0 0.0 0.823970
2020-08-21 44.35 0 1.000000 0.0 0.0 0.0 0.0 0.0 1.000000
In [51]:
print("\n🔍 5.3.5: Membership Quality Analysis...")
# Check membership properties
full_membership_points = ((df_train['max_mu'] == 1.0)).sum()
partial_membership_points = ((df_train['max_mu'] > 0) & (df_train['max_mu'] < 1.0)).sum()
zero_membership_points = (df_train['mu_sum'] == 0).sum()
overlapping_points = (df_train['mu_sum'] > 1.01).sum()
print(f"\n📊 Membership Distribution:")
print(f" Points with FULL membership (μ=1.0 in at least one set): {full_membership_points} ({full_membership_points/n_train*100:.1f}%)")
print(f" Points with PARTIAL membership (0<μ<1.0 all sets): {partial_membership_points} ({partial_membership_points/n_train*100:.1f}%)")
print(f" Points with NO membership (μ=0 all sets): {zero_membership_points} ({zero_membership_points/n_train*100:.1f}%)")
print(f" Points with OVERLAPPING membership (sum>1.01): {overlapping_points} ({overlapping_points/n_train*100:.1f}%)")
# Analyze membership sum statistics
print(f"\n📈 Membership Sum Statistics:")
print(f" Mean μ_sum: {df_train['mu_sum'].mean():.4f}")
print(f" Std μ_sum: {df_train['mu_sum'].std():.4f}")
print(f" Min μ_sum: {df_train['mu_sum'].min():.4f}")
print(f" Max μ_sum: {df_train['mu_sum'].max():.4f}")
# Count points in plateau region per cluster
print(f"\n🎯 Points in Plateau Region (Q1-Q3) per Cluster:")
for i in range(k_optimal):
in_plateau = (df_train['Cluster'] == i) & (df_train[f'mu_A{i}'] == 1.0)
total_cluster = (df_train['Cluster'] == i).sum()
print(f" Cluster {i}: {in_plateau.sum()}/{total_cluster} ({in_plateau.sum()/total_cluster*100:.1f}%) have μ=1.0")
# Overall quality assessment
print(f"\n✅ QUALITY ASSESSMENT:")
if zero_membership_points == 0 and overlapping_points < n_train * 0.05:
print(" ✅ EXCELLENT: No gaps, minimal overlap (<5%)")
elif zero_membership_points == 0:
print(" ✅ GOOD: No gaps, some overlap at boundaries")
elif zero_membership_points < n_train * 0.01:
print(" ⚠️ ACCEPTABLE: Very few gaps (<1%)")
else:
print(" ⚠️ WARNING: Significant gaps or overlap detected")
🔍 5.3.5: Membership Quality Analysis... 📊 Membership Distribution: Points with FULL membership (μ=1.0 in at least one set): 515 (49.9%) Points with PARTIAL membership (0<μ<1.0 all sets): 517 (50.1%) Points with NO membership (μ=0 all sets): 0 (0.0%) Points with OVERLAPPING membership (sum>1.01): 0 (0.0%) 📈 Membership Sum Statistics: Mean μ_sum: 0.7801 Std μ_sum: 0.2998 Min μ_sum: 0.0014 Max μ_sum: 1.0000 🎯 Points in Plateau Region (Q1-Q3) per Cluster: Cluster 0: 41/81 (50.6%) have μ=1.0 Cluster 1: 35/71 (49.3%) have μ=1.0 Cluster 2: 153/307 (49.8%) have μ=1.0 Cluster 3: 167/334 (50.0%) have μ=1.0 Cluster 4: 71/141 (50.4%) have μ=1.0 Cluster 5: 48/98 (49.0%) have μ=1.0 ✅ QUALITY ASSESSMENT: ✅ EXCELLENT: No gaps, minimal overlap (<5%)
In [52]:
print("\n📈 5.3.6: Creating Visualizations...")
# Create dense x-axis for smooth curves
x_min = df_train['Price'].min() - 5
x_max = df_train['Price'].max() + 5
x_dense = np.linspace(x_min, x_max, 2000)
plt.figure(figsize=(18, 8))
# Plot membership functions
for i in range(k_optimal):
params = df_fuzzy_sets.iloc[i]
mu_values = trapezoid_membership(x_dense, params['a'], params['b'],
params['c'], params['d'])
# Plot membership curve
plt.plot(x_dense, mu_values, linewidth=3, label=f'A{i} (${params["medoid"]:.2f})',
color=colors[i], alpha=0.9)
# Mark medoid
plt.axvline(params['medoid'], color=colors[i], linestyle='--',
linewidth=1.5, alpha=0.4)
# Highlight plateau region (Q1 to Q3)
plt.fill_between(x_dense, 0, mu_values,
where=(x_dense >= params['b']) & (x_dense <= params['c']),
color=colors[i], alpha=0.25, label=f'A{i} Plateau (Q1-Q3)')
# Mark Q1 and Q3
plt.axvline(params['b'], color=colors[i], linestyle=':',
linewidth=1, alpha=0.6)
plt.axvline(params['c'], color=colors[i], linestyle=':',
linewidth=1, alpha=0.6)
# Mark actual data points at bottom
for i in range(k_optimal):
cluster_prices = df_train[df_train['Cluster'] == i]['Price'].values
y_pos = np.full_like(cluster_prices, -0.08 - (i * 0.02))
plt.scatter(cluster_prices, y_pos, color=colors[i], alpha=0.4, s=15, marker='|')
plt.xlabel('Price (USD per barrel)', fontsize=14, fontweight='bold')
plt.ylabel('Membership Degree (μ)', fontsize=14, fontweight='bold')
plt.title(f'Fuzzy Membership Functions - Q1-Q3 Based Trapezoid (k={k_optimal})\nPlateau = Middle 50% of Each Cluster (IQR)',
fontsize=16, fontweight='bold')
plt.legend(loc='upper right', fontsize=10, ncol=1, framealpha=0.9)
plt.grid(True, alpha=0.3, linestyle='--')
plt.ylim(-0.2, 1.15)
plt.tight_layout()
plt.show()
📈 5.3.6: Creating Visualizations...
In [53]:
print("\n📊 Creating individual membership function plots...")
fig, axes = plt.subplots(2, 3, figsize=(20, 11))
axes = axes.flatten()
for i in range(k_optimal):
ax = axes[i]
params = df_fuzzy_sets.iloc[i]
# Calculate membership
mu_values = trapezoid_membership(x_dense, params['a'], params['b'],
params['c'], params['d'])
# Plot membership function
ax.plot(x_dense, mu_values, linewidth=3.5, color=colors[i],
label=f'μ_{i}(x)', zorder=3)
ax.fill_between(x_dense, 0, mu_values, color=colors[i], alpha=0.2)
# Mark trapezoid key points
trapezoid_points = [params['a'], params['b'], params['c'], params['d']]
trapezoid_mu = [0, 1, 1, 0]
ax.scatter(trapezoid_points, trapezoid_mu, color='red', s=120,
zorder=5, edgecolors='black', linewidths=2.5,
label='Trapezoid vertices')
# Annotate points
ax.text(params['a'], -0.15, f'a\n${params["a"]:.1f}',
ha='center', fontsize=9, color='red', fontweight='bold')
ax.text(params['b'], 1.05, f'b=Q1\n${params["b"]:.1f}',
ha='center', fontsize=9, color='darkgreen', fontweight='bold')
ax.text(params['c'], 1.05, f'c=Q3\n${params["c"]:.1f}',
ha='center', fontsize=9, color='darkgreen', fontweight='bold')
ax.text(params['d'], -0.15, f'd\n${params["d"]:.1f}',
ha='center', fontsize=9, color='red', fontweight='bold')
# Mark medoid
ax.axvline(params['medoid'], color='darkred', linestyle='--',
linewidth=2.5, alpha=0.6, label=f'Medoid: ${params["medoid"]:.2f}')
# Highlight plateau (Q1-Q3)
ax.axvspan(params['b'], params['c'], alpha=0.3, color='yellow',
label=f'Plateau (IQR): ${params["IQR"]:.2f}', zorder=1)
# Actual data distribution (histogram)
cluster_data = df_train[df_train['Cluster'] == i]['Price'].values
ax2 = ax.twinx()
ax2.hist(cluster_data, bins=25, alpha=0.25, color=colors[i],
edgecolor='black', linewidth=0.5)
ax2.set_ylabel('Frequency', fontsize=10, color='gray')
ax2.tick_params(axis='y', labelcolor='gray')
# Labels and title
ax.set_xlabel('Price (USD)', fontsize=11, fontweight='bold')
ax.set_ylabel('Membership (μ)', fontsize=11, fontweight='bold')
ax.set_title(f'Cluster {i}: A{i} (n={len(cluster_data)})',
fontsize=13, fontweight='bold')
ax.legend(fontsize=8, loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_ylim(-0.2, 1.25)
# Add statistics box
textstr = f'Q1: ${params["Q1"]:.2f}\nQ3: ${params["Q3"]:.2f}\nIQR: ${params["IQR"]:.2f}'
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
ax.text(0.98, 0.97, textstr, transform=ax.transAxes, fontsize=9,
verticalalignment='top', horizontalalignment='right', bbox=props)
plt.suptitle('Individual Fuzzy Membership Functions (Q1-Q3 Trapezoid) with Data Distribution',
fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()
📊 Creating individual membership function plots...
In [54]:
print("\n🔥 Creating membership heatmap...")
# Sample data for visualization (every 5th point for clarity)
sample_step = 5
sample_data = df_train.iloc[::sample_step].reset_index(drop=True)
# Create heatmap data
heatmap_data = sample_data[[f'mu_A{i}' for i in range(k_optimal)]].T.values
plt.figure(figsize=(18, 7))
im = plt.imshow(heatmap_data, aspect='auto', cmap='YlOrRd',
interpolation='nearest', vmin=0, vmax=1)
# Colorbar
cbar = plt.colorbar(im, label='Membership Degree (μ)', pad=0.02)
cbar.ax.tick_params(labelsize=11)
# Labels
plt.yticks(range(k_optimal), [f'A{i}' for i in range(k_optimal)], fontsize=12)
plt.ylabel('Fuzzy Sets', fontsize=13, fontweight='bold')
plt.xlabel('Data Points (sampled every 5th point)', fontsize=13, fontweight='bold')
plt.title('Membership Degree Heatmap - Training Data', fontsize=15, fontweight='bold')
# Add grid
for i in range(k_optimal + 1):
plt.axhline(i - 0.5, color='white', linewidth=1.5)
plt.tight_layout()
plt.show()
🔥 Creating membership heatmap...
In [55]:
print("\n📋 Creating parameter table visualization...")
fig, ax = plt.subplots(figsize=(18, 6))
ax.axis('off')
table_data = []
for i in range(k_optimal):
stats = cluster_stats[i]
params = df_fuzzy_sets.iloc[i]
cluster_range = stats['max'] - stats['min']
plateau_pct = (params['IQR'] / cluster_range) * 100
table_data.append([
f'A{i}',
stats['size'],
f"${params['a']:.2f}",
f"${params['b']:.2f}",
f"${params['c']:.2f}",
f"${params['d']:.2f}",
f"${params['medoid']:.2f}",
f"${params['IQR']:.2f}",
f"{plateau_pct:.1f}%",
f"${stats['min']:.2f}",
f"${stats['max']:.2f}"
])
table = ax.table(
cellText=table_data,
colLabels=['Fuzzy\nSet', 'Size', 'a\n(Lower)', 'b=Q1\n(Plateau Start)',
'c=Q3\n(Plateau End)', 'd\n(Upper)', 'Medoid',
'IQR\n(Plateau)', 'IQR %\nof Range', 'Cluster\nMin', 'Cluster\nMax'],
cellLoc='center',
loc='center',
bbox=[0, 0, 1, 1]
)
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 3)
# Header styling
for i in range(11):
table[(0, i)].set_facecolor('#2c3e50')
table[(0, i)].set_text_props(weight='bold', color='white', fontsize=10)
# Row coloring
for i in range(k_optimal):
for j in range(11):
table[(i+1, j)].set_facecolor(colors[i])
table[(i+1, j)].set_alpha(0.4)
table[(i+1, j)].set_text_props(fontsize=10)
# Highlight Q1-Q3 columns
for i in range(k_optimal):
table[(i+1, 3)].set_facecolor('lightgreen')
table[(i+1, 3)].set_alpha(0.7)
table[(i+1, 4)].set_facecolor('lightgreen')
table[(i+1, 4)].set_alpha(0.7)
ax.set_title('Fuzzy Membership Function Parameters (Q1-Q3 Based Trapezoid)\nGreen = Plateau Boundaries (Middle 50% of Data)',
fontsize=15, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()
📋 Creating parameter table visualization...
In [56]:
print("\n📊 Creating membership scatter plots...")
fig, axes = plt.subplots(2, 3, figsize=(20, 11))
axes = axes.flatten()
for i in range(k_optimal):
ax = axes[i]
params = df_fuzzy_sets.iloc[i]
# Scatter plot: Price vs Membership
scatter = ax.scatter(df_train['Price'], df_train[f'mu_A{i}'],
c=df_train[f'mu_A{i}'], cmap='YlOrRd',
s=25, alpha=0.6, edgecolors='black', linewidths=0.3,
vmin=0, vmax=1)
# Mark key boundaries
ax.axvline(params['a'], color='red', linestyle=':', linewidth=1.5,
alpha=0.5, label=f'a=${params["a"]:.1f}')
ax.axvline(params['b'], color='green', linestyle='--', linewidth=2,
alpha=0.7, label=f'Q1=${params["b"]:.1f}')
ax.axvline(params['c'], color='green', linestyle='--', linewidth=2,
alpha=0.7, label=f'Q3=${params["c"]:.1f}')
ax.axvline(params['d'], color='red', linestyle=':', linewidth=1.5,
alpha=0.5, label=f'd=${params["d"]:.1f}')
ax.axvline(params['medoid'], color='darkred', linestyle='-', linewidth=2.5,
alpha=0.8, label=f'Medoid=${params["medoid"]:.1f}')
# Highlight plateau region
ax.axvspan(params['b'], params['c'], alpha=0.15, color='yellow', zorder=1)
ax.axhline(1.0, color='black', linestyle='-', linewidth=1, alpha=0.3)
# Colorbar
plt.colorbar(scatter, ax=ax, label='μ', pad=0.02)
# Labels
ax.set_xlabel('Price (USD)', fontsize=11, fontweight='bold')
ax.set_ylabel(f'Membership μ_A{i}', fontsize=11, fontweight='bold')
ax.set_title(f'Cluster {i}: Membership Degree vs Price',
fontsize=12, fontweight='bold')
ax.legend(fontsize=8, loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_ylim(-0.1, 1.15)
📊 Creating membership scatter plots...
5.3.2. Build Model FTSMC¶
In [57]:
# ============================================================================
# STEP 1: ASSIGN FUZZY STATES BASED ON MEMBERSHIP FUNCTION
# ============================================================================
print("\n📊 STEP 1: Assigning Fuzzy States to Training Data")
print("-"*100)
# Fungsi untuk menentukan fuzzy state berdasarkan membership maksimum
def assign_fuzzy_state(row, k_optimal):
"""
Assign fuzzy state based on maximum membership degree
"""
memberships = [row[f'mu_A{i}'] for i in range(k_optimal)]
max_membership = max(memberships)
# Jika ada membership yang sama (tie), pilih state dengan index terkecil
fuzzy_state = memberships.index(max_membership)
return fuzzy_state, max_membership
# Assign fuzzy states
fuzzy_states = []
max_memberships = []
for idx, row in df_train.iterrows():
state, max_mu = assign_fuzzy_state(row, k_optimal)
fuzzy_states.append(state)
max_memberships.append(max_mu)
df_train['Fuzzy_State'] = fuzzy_states
df_train['Max_Membership'] = max_memberships
# Tampilkan hasil assignment
print("\n✅ Fuzzy State Assignment Complete!")
print(f"\nSample Results (first 20 rows):")
sample_cols = ['Date', 'Price', 'Cluster'] + [f'mu_A{i}' for i in range(k_optimal)] + ['Fuzzy_State', 'Max_Membership']
print(df_train[sample_cols].head(20).to_string(index=False))
# Statistik fuzzy states
print("\n📈 Fuzzy State Distribution:")
state_counts = df_train['Fuzzy_State'].value_counts().sort_index()
for state in range(k_optimal):
count = state_counts.get(state, 0)
percentage = (count / len(df_train)) * 100
print(f" State A{state}: {count:4d} occurrences ({percentage:5.2f}%)")
📊 STEP 1: Assigning Fuzzy States to Training Data
----------------------------------------------------------------------------------------------------
✅ Fuzzy State Assignment Complete!
Sample Results (first 20 rows):
Date Price Cluster mu_A0 mu_A1 mu_A2 mu_A3 mu_A4 mu_A5 Fuzzy_State Max_Membership
2020-08-03 44.15 0 1.000000 0.0 0.0 0.0 0.0 0.0 0 1.000000
2020-08-04 44.43 0 1.000000 0.0 0.0 0.0 0.0 0.0 0 1.000000
2020-08-05 45.17 0 0.722846 0.0 0.0 0.0 0.0 0.0 0 0.722846
2020-08-06 45.09 0 0.752809 0.0 0.0 0.0 0.0 0.0 0 0.752809
2020-08-07 44.40 0 1.000000 0.0 0.0 0.0 0.0 0.0 0 1.000000
2020-08-10 44.99 0 0.790262 0.0 0.0 0.0 0.0 0.0 0 0.790262
2020-08-11 44.50 0 0.973783 0.0 0.0 0.0 0.0 0.0 0 0.973783
2020-08-12 45.43 0 0.625468 0.0 0.0 0.0 0.0 0.0 0 0.625468
2020-08-13 44.96 0 0.801498 0.0 0.0 0.0 0.0 0.0 0 0.801498
2020-08-14 44.80 0 0.861423 0.0 0.0 0.0 0.0 0.0 0 0.861423
2020-08-17 45.37 0 0.647940 0.0 0.0 0.0 0.0 0.0 0 0.647940
2020-08-18 45.46 0 0.614232 0.0 0.0 0.0 0.0 0.0 0 0.614232
2020-08-19 45.37 0 0.647940 0.0 0.0 0.0 0.0 0.0 0 0.647940
2020-08-20 44.90 0 0.823970 0.0 0.0 0.0 0.0 0.0 0 0.823970
2020-08-21 44.35 0 1.000000 0.0 0.0 0.0 0.0 0.0 0 1.000000
2020-08-24 45.13 0 0.737828 0.0 0.0 0.0 0.0 0.0 0 0.737828
2020-08-25 45.86 0 0.464419 0.0 0.0 0.0 0.0 0.0 0 0.464419
2020-08-26 45.64 0 0.546816 0.0 0.0 0.0 0.0 0.0 0 0.546816
2020-08-27 45.09 0 0.752809 0.0 0.0 0.0 0.0 0.0 0 0.752809
2020-08-28 45.05 0 0.767790 0.0 0.0 0.0 0.0 0.0 0 0.767790
📈 Fuzzy State Distribution:
State A0: 81 occurrences ( 7.85%)
State A1: 71 occurrences ( 6.88%)
State A2: 307 occurrences (29.75%)
State A3: 334 occurrences (32.36%)
State A4: 141 occurrences (13.66%)
State A5: 98 occurrences ( 9.50%)
In [58]:
# ============================================================================
# STEP 2: CREATE FUZZY LOGICAL RELATIONSHIPS (FLR)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 2: Creating Fuzzy Logical Relationships (FLR)")
print("-"*100)
# Buat FLR: hubungan antara state pada waktu t-1 dengan state pada waktu t
flr_list = []
for i in range(1, len(df_train)):
current_state = df_train.loc[i, 'Fuzzy_State']
previous_state = df_train.loc[i-1, 'Fuzzy_State']
current_price = df_train.loc[i, 'Price']
previous_price = df_train.loc[i-1, 'Price']
flr_list.append({
'Index': i,
'Date': df_train.loc[i, 'Date'],
'Previous_State': previous_state,
'Current_State': current_state,
'Previous_Price': previous_price,
'Current_Price': current_price,
'FLR': f'A{previous_state} → A{current_state}'
})
df_flr = pd.DataFrame(flr_list)
print(f"\n✅ Total FLR Created: {len(df_flr)}")
print(f"\nSample FLR (first 20 relationships):")
print(df_flr[['Date', 'Previous_State', 'Current_State', 'FLR', 'Previous_Price', 'Current_Price']].head(20).to_string(index=False))
# Hitung frekuensi setiap FLR
flr_frequency = df_flr['FLR'].value_counts()
print(f"\n📊 Top 10 Most Frequent FLR:")
print(flr_frequency.head(10))
====================================================================================================
📊 STEP 2: Creating Fuzzy Logical Relationships (FLR)
----------------------------------------------------------------------------------------------------
✅ Total FLR Created: 1031
Sample FLR (first 20 relationships):
Date Previous_State Current_State FLR Previous_Price Current_Price
2020-08-04 0 0 A0 → A0 44.15 44.43
2020-08-05 0 0 A0 → A0 44.43 45.17
2020-08-06 0 0 A0 → A0 45.17 45.09
2020-08-07 0 0 A0 → A0 45.09 44.40
2020-08-10 0 0 A0 → A0 44.40 44.99
2020-08-11 0 0 A0 → A0 44.99 44.50
2020-08-12 0 0 A0 → A0 44.50 45.43
2020-08-13 0 0 A0 → A0 45.43 44.96
2020-08-14 0 0 A0 → A0 44.96 44.80
2020-08-17 0 0 A0 → A0 44.80 45.37
2020-08-18 0 0 A0 → A0 45.37 45.46
2020-08-19 0 0 A0 → A0 45.46 45.37
2020-08-20 0 0 A0 → A0 45.37 44.90
2020-08-21 0 0 A0 → A0 44.90 44.35
2020-08-24 0 0 A0 → A0 44.35 45.13
2020-08-25 0 0 A0 → A0 45.13 45.86
2020-08-26 0 0 A0 → A0 45.86 45.64
2020-08-27 0 0 A0 → A0 45.64 45.09
2020-08-28 0 0 A0 → A0 45.09 45.05
2020-08-31 0 0 A0 → A0 45.05 45.28
📊 Top 10 Most Frequent FLR:
FLR
A3 → A3 305
A2 → A2 282
A4 → A4 121
A5 → A5 90
A0 → A0 80
A1 → A1 63
A2 → A3 17
A3 → A2 17
A3 → A4 12
A4 → A3 12
Name: count, dtype: int64
In [59]:
# ============================================================================
# STEP 3: CREATE FUZZY LOGICAL RELATIONSHIP GROUPS (FLRG)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 3: Creating Fuzzy Logical Relationship Groups (FLRG)")
print("-"*100)
# Group FLR berdasarkan Previous_State
flrg_dict = defaultdict(list)
for idx, row in df_flr.iterrows():
prev_state = row['Previous_State']
curr_state = row['Current_State']
flrg_dict[prev_state].append(curr_state)
# Buat FLRG summary
flrg_summary = []
for state in range(k_optimal):
if state in flrg_dict:
transitions = flrg_dict[state]
unique_transitions = sorted(set(transitions))
# Hitung frekuensi setiap transisi
transition_counts = {}
for trans in transitions:
transition_counts[trans] = transition_counts.get(trans, 0) + 1
flrg_summary.append({
'State': f'A{state}',
'Total_Transitions': len(transitions),
'Unique_States': unique_transitions,
'Transition_Counts': transition_counts,
'FLRG': f"A{state} → {', '.join([f'A{s}' for s in unique_transitions])}"
})
else:
flrg_summary.append({
'State': f'A{state}',
'Total_Transitions': 0,
'Unique_States': [],
'Transition_Counts': {},
'FLRG': f"A{state} → ∅ (No transitions)"
})
df_flrg = pd.DataFrame(flrg_summary)
print("\n✅ FLRG Created Successfully!")
print("\n" + "-"*100)
print("FUZZY LOGICAL RELATIONSHIP GROUPS (FLRG)")
print("-"*100)
for idx, row in df_flrg.iterrows():
print(f"\n{row['State']}:")
print(f" FLRG: {row['FLRG']}")
print(f" Total Transitions: {row['Total_Transitions']}")
if row['Transition_Counts']:
print(f" Transition Details:")
for to_state, count in sorted(row['Transition_Counts'].items()):
percentage = (count / row['Total_Transitions']) * 100
print(f" → A{to_state}: {count:4d} times ({percentage:5.2f}%)")
====================================================================================================
📊 STEP 3: Creating Fuzzy Logical Relationship Groups (FLRG)
----------------------------------------------------------------------------------------------------
✅ FLRG Created Successfully!
----------------------------------------------------------------------------------------------------
FUZZY LOGICAL RELATIONSHIP GROUPS (FLRG)
----------------------------------------------------------------------------------------------------
A0:
FLRG: A0 → A0, A1
Total Transitions: 81
Transition Details:
→ A0: 80 times (98.77%)
→ A1: 1 times ( 1.23%)
A1:
FLRG: A1 → A1, A2
Total Transitions: 71
Transition Details:
→ A1: 63 times (88.73%)
→ A2: 8 times (11.27%)
A2:
FLRG: A2 → A1, A2, A3
Total Transitions: 306
Transition Details:
→ A1: 7 times ( 2.29%)
→ A2: 282 times (92.16%)
→ A3: 17 times ( 5.56%)
A3:
FLRG: A3 → A2, A3, A4
Total Transitions: 334
Transition Details:
→ A2: 17 times ( 5.09%)
→ A3: 305 times (91.32%)
→ A4: 12 times ( 3.59%)
A4:
FLRG: A4 → A3, A4, A5
Total Transitions: 141
Transition Details:
→ A3: 12 times ( 8.51%)
→ A4: 121 times (85.82%)
→ A5: 8 times ( 5.67%)
A5:
FLRG: A5 → A4, A5
Total Transitions: 98
Transition Details:
→ A4: 8 times ( 8.16%)
→ A5: 90 times (91.84%)
In [60]:
# ============================================================================
# STEP 4: CREATE MARKOV TRANSITION PROBABILITY MATRIX
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 4: Creating Markov Transition Probability Matrix")
print("-"*100)
# Initialize transition matrix
transition_matrix = np.zeros((k_optimal, k_optimal))
# Hitung transition probabilities
for state in range(k_optimal):
if state in flrg_dict:
transitions = flrg_dict[state]
total = len(transitions)
for to_state in range(k_optimal):
count = transitions.count(to_state)
transition_matrix[state, to_state] = count / total if total > 0 else 0
# Buat DataFrame untuk tampilan yang lebih baik
transition_df = pd.DataFrame(
transition_matrix,
index=[f'A{i}' for i in range(k_optimal)],
columns=[f'A{i}' for i in range(k_optimal)]
)
print("\n✅ Transition Probability Matrix Created!")
print("\n" + "-"*100)
print("MARKOV TRANSITION PROBABILITY MATRIX")
print("-"*100)
print("\nP = ")
print(transition_df.to_string(float_format=lambda x: f'{x:.4f}'))
# Validasi: setiap baris harus sum = 1
print("\n🔍 Matrix Validation:")
row_sums = transition_matrix.sum(axis=1)
print("Row sums (should all be 1.0 or 0.0 for states with no transitions):")
for i, row_sum in enumerate(row_sums):
status = "✓" if abs(row_sum - 1.0) < 0.001 or row_sum == 0 else "✗"
print(f" A{i}: {row_sum:.6f} {status}")
====================================================================================================
📊 STEP 4: Creating Markov Transition Probability Matrix
----------------------------------------------------------------------------------------------------
✅ Transition Probability Matrix Created!
----------------------------------------------------------------------------------------------------
MARKOV TRANSITION PROBABILITY MATRIX
----------------------------------------------------------------------------------------------------
P =
A0 A1 A2 A3 A4 A5
A0 0.9877 0.0123 0.0000 0.0000 0.0000 0.0000
A1 0.0000 0.8873 0.1127 0.0000 0.0000 0.0000
A2 0.0000 0.0229 0.9216 0.0556 0.0000 0.0000
A3 0.0000 0.0000 0.0509 0.9132 0.0359 0.0000
A4 0.0000 0.0000 0.0000 0.0851 0.8582 0.0567
A5 0.0000 0.0000 0.0000 0.0000 0.0816 0.9184
🔍 Matrix Validation:
Row sums (should all be 1.0 or 0.0 for states with no transitions):
A0: 1.000000 ✓
A1: 1.000000 ✓
A2: 1.000000 ✓
A3: 1.000000 ✓
A4: 1.000000 ✓
A5: 1.000000 ✓
In [61]:
# ============================================================================
# STEP 5: FORECASTING USING FTSMC (SEPARATED INITIAL & FINAL)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 5: Forecasting using FTSMC (Initial & Final Forecast)")
print("-"*100)
# Midpoint dari setiap fuzzy set (untuk defuzzification)
midpoints = []
for i in range(k_optimal):
params = df_fuzzy_sets.iloc[i]
midpoint = (params['b'] + params['c']) / 2 # Midpoint dari plateau (Q1 dan Q3)
midpoints.append(midpoint)
print(f"\n📍 Midpoints for each Fuzzy Set:")
for i, mid in enumerate(midpoints):
print(f" A{i}: ${mid:.2f}")
# Fungsi forecasting INITIAL (tanpa adjustment)
def forecast_ftsmc_initial(current_state, current_price, transition_matrix, midpoints):
"""
Initial forecast using FTSMC (before adjustment)
Rules:
1. If no FLR (empty state): return midpoint
2. If one-to-one relationship: return target midpoint
3. If one-to-many: use weighted average based on transition probabilities
"""
probabilities = transition_matrix[current_state]
# Rule 1: No transitions
if probabilities.sum() == 0:
return midpoints[current_state]
# Rule 2 & 3: Calculate weighted forecast
forecast = 0
for to_state in range(len(midpoints)):
if to_state == current_state:
# Use current price instead of midpoint for same state (Rule 3 formula)
forecast += current_price * probabilities[to_state]
else:
forecast += midpoints[to_state] * probabilities[to_state]
return forecast
# Lakukan forecasting untuk training data
print("\n🔄 Processing Training Data Forecasts...")
print("-"*100)
forecast_results = []
for i in range(len(df_train)):
if i == 0:
# First data point - no previous state
forecast_results.append({
'Index': i,
'Date': df_train.loc[i, 'Date'],
'Y(t)': df_train.loc[i, 'Price'],
'Y(t-1)': None,
'diff(Y(t))': 0,
'Fuzzy_State_t-1': None,
'Initial_Forecast_F(t)': None,
'Final_Forecast_F^(t)': None
})
else:
# Get data
current_price = df_train.loc[i, 'Price']
prev_price = df_train.loc[i-1, 'Price']
prev_state = df_train.loc[i-1, 'Fuzzy_State']
# Calculate diff(Y(t)) = Y(t) - Y(t-1)
diff_yt = current_price - prev_price
# Initial forecast F(t)
initial_forecast = forecast_ftsmc_initial(prev_state, prev_price, transition_matrix, midpoints)
# Final forecast F^(t) = F(t) + diff(Y(t-1))
# diff(Y(t-1)) is the difference for the PREVIOUS time step
if i == 1:
diff_yt_minus1 = 0 # No diff for t=1
else:
diff_yt_minus1 = df_train.loc[i-1, 'Price'] - df_train.loc[i-2, 'Price']
final_forecast = initial_forecast + diff_yt_minus1
forecast_results.append({
'Index': i,
'Date': df_train.loc[i, 'Date'],
'Y(t)': current_price,
'Y(t-1)': prev_price,
'diff(Y(t))': diff_yt,
'Fuzzy_State_t-1': f'A{prev_state}',
'Initial_Forecast_F(t)': initial_forecast,
'Final_Forecast_F^(t)': final_forecast
})
df_forecast_train = pd.DataFrame(forecast_results)
print("\n✅ Forecasting Complete!")
print(f"\n📋 Sample Forecast Results (first 30 rows):")
print(df_forecast_train.head(30).to_string(index=False, float_format=lambda x: f'{x:.2f}' if pd.notna(x) else 'NaN'))
print(f"\n📋 Sample Forecast Results (last 20 rows):")
print(df_forecast_train.tail(20).to_string(index=False, float_format=lambda x: f'{x:.2f}' if pd.notna(x) else 'NaN'))
====================================================================================================
📊 STEP 5: Forecasting using FTSMC (Initial & Final Forecast)
----------------------------------------------------------------------------------------------------
📍 Midpoints for each Fuzzy Set:
A0: $42.86
A1: $55.78
A2: $73.85
A3: $84.16
A4: $93.95
A5: $111.31
🔄 Processing Training Data Forecasts...
----------------------------------------------------------------------------------------------------
✅ Forecasting Complete!
📋 Sample Forecast Results (first 30 rows):
Index Date Y(t) Y(t-1) diff(Y(t)) Fuzzy_State_t-1 Initial_Forecast_F(t) Final_Forecast_F^(t)
0 2020-08-03 44.15 NaN 0.00 None NaN NaN
1 2020-08-04 44.43 44.15 0.28 A0 44.29 44.29
2 2020-08-05 45.17 44.43 0.74 A0 44.57 44.85
3 2020-08-06 45.09 45.17 -0.08 A0 45.30 46.04
4 2020-08-07 44.40 45.09 -0.69 A0 45.22 45.14
5 2020-08-10 44.99 44.40 0.59 A0 44.54 43.85
6 2020-08-11 44.50 44.99 -0.49 A0 45.12 45.71
7 2020-08-12 45.43 44.50 0.93 A0 44.64 44.15
8 2020-08-13 44.96 45.43 -0.47 A0 45.56 46.49
9 2020-08-14 44.80 44.96 -0.16 A0 45.09 44.62
10 2020-08-17 45.37 44.80 0.57 A0 44.94 44.78
11 2020-08-18 45.46 45.37 0.09 A0 45.50 46.07
12 2020-08-19 45.37 45.46 -0.09 A0 45.59 45.68
13 2020-08-20 44.90 45.37 -0.47 A0 45.50 45.41
14 2020-08-21 44.35 44.90 -0.55 A0 45.03 44.56
15 2020-08-24 45.13 44.35 0.78 A0 44.49 43.94
16 2020-08-25 45.86 45.13 0.73 A0 45.26 46.04
17 2020-08-26 45.64 45.86 -0.22 A0 45.98 46.71
18 2020-08-27 45.09 45.64 -0.55 A0 45.77 45.55
19 2020-08-28 45.05 45.09 -0.04 A0 45.22 44.67
20 2020-08-31 45.28 45.05 0.23 A0 45.18 45.14
21 2020-09-01 45.58 45.28 0.30 A0 45.41 45.64
22 2020-09-02 44.43 45.58 -1.15 A0 45.71 46.01
23 2020-09-03 44.07 44.43 -0.36 A0 44.57 43.42
24 2020-09-04 42.66 44.07 -1.41 A0 44.21 43.85
25 2020-09-07 42.01 42.66 -0.65 A0 42.82 41.41
26 2020-09-08 39.78 42.01 -2.23 A0 42.18 41.53
27 2020-09-09 40.79 39.78 1.01 A0 39.98 37.75
28 2020-09-10 40.06 40.79 -0.73 A0 40.98 41.99
29 2020-09-11 39.83 40.06 -0.23 A0 40.25 39.52
📋 Sample Forecast Results (last 20 rows):
Index Date Y(t) Y(t-1) diff(Y(t)) Fuzzy_State_t-1 Initial_Forecast_F(t) Final_Forecast_F^(t)
1012 2024-07-03 87.34 86.24 1.10 A3 85.89 85.53
1013 2024-07-04 87.43 87.34 0.09 A3 86.89 87.99
1014 2024-07-05 86.54 87.43 -0.89 A3 86.97 87.06
1015 2024-07-08 85.75 86.54 -0.79 A3 86.16 85.27
1016 2024-07-09 84.66 85.75 -1.09 A3 85.44 84.65
1017 2024-07-10 85.08 84.66 0.42 A3 84.44 83.35
1018 2024-07-11 85.40 85.08 0.32 A3 84.83 85.25
1019 2024-07-12 85.03 85.40 -0.37 A3 85.12 85.44
1020 2024-07-15 84.85 85.03 -0.18 A3 84.78 84.41
1021 2024-07-16 83.73 84.85 -1.12 A3 84.62 84.44
1022 2024-07-17 85.08 83.73 1.35 A3 83.59 82.47
1023 2024-07-18 85.11 85.08 0.03 A3 84.83 86.18
1024 2024-07-19 82.63 85.11 -2.48 A3 84.85 84.88
1025 2024-07-22 82.40 82.63 -0.23 A3 82.59 80.11
1026 2024-07-23 81.01 82.40 -1.39 A3 82.38 82.15
1027 2024-07-24 81.71 81.01 0.70 A3 81.11 79.72
1028 2024-07-25 82.37 81.71 0.66 A3 81.75 82.45
1029 2024-07-26 81.13 82.37 -1.24 A3 82.35 83.01
1030 2024-07-29 79.78 81.13 -1.35 A3 81.22 79.98
1031 2024-07-30 78.63 79.78 -1.15 A3 79.99 78.64
In [62]:
# ============================================================================
# STEP 6: CALCULATE FORECASTING ACCURACY (INITIAL vs FINAL)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 6: Calculating Forecasting Accuracy (Initial vs Final)")
print("-"*100)
# Remove first row (no forecast for first data point)
df_forecast_eval = df_forecast_train[1:].copy()
# Actual values
actual = df_forecast_eval['Y(t)'].values
# Initial forecast values
initial_pred = df_forecast_eval['Initial_Forecast_F(t)'].values
# Final forecast values
final_pred = df_forecast_eval['Final_Forecast_F^(t)'].values
# ===== INITIAL FORECAST ACCURACY =====
print("\n" + "="*60)
print("📈 INITIAL FORECAST ACCURACY (F(t))")
print("="*60)
initial_mae = mean_absolute_error(actual, initial_pred)
initial_mse = mean_squared_error(actual, initial_pred)
initial_rmse = np.sqrt(initial_mse)
initial_mape = mean_absolute_percentage_error(actual, initial_pred) * 100
print(f"\n MAE (Mean Absolute Error) : ${initial_mae:.4f}")
print(f" MSE (Mean Squared Error) : ${initial_mse:.4f}")
print(f" RMSE (Root Mean Squared Error) : ${initial_rmse:.4f}")
print(f" MAPE (Mean Absolute Percentage Error): {initial_mape:.4f}%")
# Interpretasi akurasi
def interpret_accuracy(mae_value):
if mae_value <= 10:
return "EXCELLENT ⭐⭐⭐"
elif mae_value <= 20:
return "GOOD ⭐⭐"
elif mae_value <= 50:
return "FAIR ⭐"
else:
return "POOR"
initial_accuracy_level = interpret_accuracy(initial_mae)
print(f"\n Accuracy Level: {initial_accuracy_level}")
# ===== FINAL FORECAST ACCURACY =====
print("\n" + "="*60)
print("📈 FINAL FORECAST ACCURACY (F^(t) - Adjusted)")
print("="*60)
final_mae = mean_absolute_error(actual, final_pred)
final_mse = mean_squared_error(actual, final_pred)
final_rmse = np.sqrt(final_mse)
final_mape = mean_absolute_percentage_error(actual, final_pred) * 100
print(f"\n MAE (Mean Absolute Error) : ${final_mae:.4f}")
print(f" MSE (Mean Squared Error) : ${final_mse:.4f}")
print(f" RMSE (Root Mean Squared Error) : ${final_rmse:.4f}")
print(f" MAPE (Mean Absolute Percentage Error): {final_mape:.4f}%")
final_accuracy_level = interpret_accuracy(final_mae)
print(f"\n Accuracy Level: {final_accuracy_level}")
# ===== COMPARISON =====
print("\n" + "="*60)
print("🔍 COMPARISON: INITIAL vs FINAL FORECAST")
print("="*60)
comparison_data = {
'Metric': ['MAE', 'MSE', 'RMSE', 'MAPE (%)'],
'Initial_F(t)': [initial_mae, initial_mse, initial_rmse, initial_mape],
'Final_F^(t)': [final_mae, final_mse, final_rmse, final_mape],
'Improvement': [
initial_mae - final_mae,
initial_mse - final_mse,
initial_rmse - final_rmse,
initial_mape - final_mape
]
}
df_comparison = pd.DataFrame(comparison_data)
print("\n" + df_comparison.to_string(index=False, float_format=lambda x: f'{x:.4f}'))
# Determine which is better
print("\n🏆 WINNER:")
if final_mae < initial_mae:
improvement_pct = ((initial_mae - final_mae) / initial_mae) * 100
print(f" Final Forecast (Adjusted) is BETTER by {improvement_pct:.2f}%")
print(f" ✅ Adjustment with diff(Y(t)) IMPROVES accuracy!")
else:
degradation_pct = ((final_mae - initial_mae) / initial_mae) * 100
print(f" Initial Forecast is BETTER by {degradation_pct:.2f}%")
print(f" ⚠️ Adjustment with diff(Y(t)) DECREASES accuracy!")
====================================================================================================
📊 STEP 6: Calculating Forecasting Accuracy (Initial vs Final)
----------------------------------------------------------------------------------------------------
============================================================
📈 INITIAL FORECAST ACCURACY (F(t))
============================================================
MAE (Mean Absolute Error) : $1.4494
MSE (Mean Squared Error) : $4.1424
RMSE (Root Mean Squared Error) : $2.0353
MAPE (Mean Absolute Percentage Error): 1.8210%
Accuracy Level: EXCELLENT ⭐⭐⭐
============================================================
📈 FINAL FORECAST ACCURACY (F^(t) - Adjusted)
============================================================
MAE (Mean Absolute Error) : $1.9680
MSE (Mean Squared Error) : $7.6556
RMSE (Root Mean Squared Error) : $2.7669
MAPE (Mean Absolute Percentage Error): 2.4659%
Accuracy Level: EXCELLENT ⭐⭐⭐
============================================================
🔍 COMPARISON: INITIAL vs FINAL FORECAST
============================================================
Metric Initial_F(t) Final_F^(t) Improvement
MAE 1.4494 1.9680 -0.5187
MSE 4.1424 7.6556 -3.5132
RMSE 2.0353 2.7669 -0.7316
MAPE (%) 1.8210 2.4659 -0.6450
🏆 WINNER:
Initial Forecast is BETTER by 35.79%
⚠️ Adjustment with diff(Y(t)) DECREASES accuracy!
In [63]:
# ============================================================================
# STEP 7: VISUALIZE FORECASTING RESULTS (INITIAL vs FINAL)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 7: Visualizing Forecasting Results (Initial vs Final)")
print("-"*100)
# Plot 1: Actual vs Initial vs Final (Full Time Series)
plt.figure(figsize=(20, 7))
plt.plot(df_forecast_train['Date'], df_forecast_train['Y(t)'],
label='Actual Price', color='blue', linewidth=2.5, alpha=0.8)
plt.plot(df_forecast_train['Date'], df_forecast_train['Initial_Forecast_F(t)'],
label='Initial Forecast F(t)', color='orange', linewidth=1.5, alpha=0.7)
plt.plot(df_forecast_train['Date'], df_forecast_train['Final_Forecast_F^(t)'],
label='Final Forecast F^(t) (Adjusted)', color='red', linewidth=1.5, alpha=0.7)
plt.xlabel('Date', fontsize=13, fontweight='bold')
plt.ylabel('Price (USD)', fontsize=13, fontweight='bold')
plt.title('Brent Oil Price: Actual vs FTSMC Forecasts (Training Data)\nInitial F(t) vs Final F^(t)',
fontsize=15, fontweight='bold')
plt.legend(fontsize=11, loc='upper left')
plt.grid(True, alpha=0.3, linestyle='--')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
==================================================================================================== 📊 STEP 7: Visualizing Forecasting Results (Initial vs Final) ----------------------------------------------------------------------------------------------------
In [64]:
# Plot 2: Detailed view (First 100 days)
plt.figure(figsize=(18, 7))
n_display = min(100, len(df_forecast_train))
plt.plot(df_forecast_train['Date'][:n_display], df_forecast_train['Y(t)'][:n_display],
'o-', label='Actual Price', color='blue', linewidth=2.5, markersize=6)
plt.plot(df_forecast_train['Date'][:n_display], df_forecast_train['Initial_Forecast_F(t)'][:n_display],
's-', label='Initial Forecast F(t)', color='orange', linewidth=2, markersize=5, alpha=0.7)
plt.plot(df_forecast_train['Date'][:n_display], df_forecast_train['Final_Forecast_F^(t)'][:n_display],
'^-', label='Final Forecast F^(t)', color='red', linewidth=2, markersize=5, alpha=0.7)
plt.xlabel('Date', fontsize=13, fontweight='bold')
plt.ylabel('Price (USD)', fontsize=13, fontweight='bold')
plt.title(f'Detailed View: Actual vs FTSMC Forecasts (First {n_display} Days)',
fontsize=15, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, linestyle='--')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In [65]:
# Plot 3: Forecast Errors Comparison
errors_initial = df_forecast_eval['Y(t)'] - df_forecast_eval['Initial_Forecast_F(t)']
errors_final = df_forecast_eval['Y(t)'] - df_forecast_eval['Final_Forecast_F^(t)']
fig, axes = plt.subplots(2, 2, figsize=(20, 12))
# Error over time - Initial
axes[0, 0].plot(df_forecast_eval['Date'], errors_initial, color='orange', linewidth=1, alpha=0.6)
axes[0, 0].axhline(y=0, color='black', linestyle='--', linewidth=1.5)
axes[0, 0].fill_between(df_forecast_eval['Date'], errors_initial, 0,
where=(errors_initial > 0), color='green', alpha=0.3, label='Overestimate')
axes[0, 0].fill_between(df_forecast_eval['Date'], errors_initial, 0,
where=(errors_initial < 0), color='red', alpha=0.3, label='Underestimate')
axes[0, 0].set_xlabel('Date', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[0, 0].set_title('Initial Forecast Errors Over Time', fontsize=13, fontweight='bold')
axes[0, 0].legend(fontsize=10)
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].tick_params(axis='x', rotation=45)
# Error over time - Final
axes[0, 1].plot(df_forecast_eval['Date'], errors_final, color='red', linewidth=1, alpha=0.6)
axes[0, 1].axhline(y=0, color='black', linestyle='--', linewidth=1.5)
axes[0, 1].fill_between(df_forecast_eval['Date'], errors_final, 0,
where=(errors_final > 0), color='green', alpha=0.3, label='Overestimate')
axes[0, 1].fill_between(df_forecast_eval['Date'], errors_final, 0,
where=(errors_final < 0), color='red', alpha=0.3, label='Underestimate')
axes[0, 1].set_xlabel('Date', fontsize=12, fontweight='bold')
axes[0, 1].set_ylabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[0, 1].set_title('Final Forecast Errors Over Time', fontsize=13, fontweight='bold')
axes[0, 1].legend(fontsize=10)
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].tick_params(axis='x', rotation=45)
# Error distribution - Initial
axes[1, 0].hist(errors_initial, bins=50, color='orange', alpha=0.7, edgecolor='black')
axes[1, 0].axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero Error')
axes[1, 0].axvline(x=errors_initial.mean(), color='green', linestyle='--', linewidth=2,
label=f'Mean Error: ${errors_initial.mean():.2f}')
axes[1, 0].set_xlabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[1, 0].set_ylabel('Frequency', fontsize=12, fontweight='bold')
axes[1, 0].set_title('Distribution of Initial Forecast Errors', fontsize=13, fontweight='bold')
axes[1, 0].legend(fontsize=10)
axes[1, 0].grid(True, alpha=0.3, axis='y')
# Error distribution - Final
axes[1, 1].hist(errors_final, bins=50, color='red', alpha=0.7, edgecolor='black')
axes[1, 1].axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero Error')
axes[1, 1].axvline(x=errors_final.mean(), color='green', linestyle='--', linewidth=2,
label=f'Mean Error: ${errors_final.mean():.2f}')
axes[1, 1].set_xlabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Frequency', fontsize=12, fontweight='bold')
axes[1, 1].set_title('Distribution of Final Forecast Errors', fontsize=13, fontweight='bold')
axes[1, 1].legend(fontsize=10)
axes[1, 1].grid(True, alpha=0.3, axis='y')
plt.suptitle('Forecast Errors Comparison: Initial vs Final', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()
In [66]:
# Plot 4: Scatter Plots (Actual vs Predicted)
fig, axes = plt.subplots(1, 2, figsize=(18, 8))
# Scatter - Initial
axes[0].scatter(df_forecast_eval['Y(t)'], df_forecast_eval['Initial_Forecast_F(t)'],
alpha=0.5, s=30, c='orange', edgecolors='black', linewidths=0.5)
min_val = min(df_forecast_eval['Y(t)'].min(), df_forecast_eval['Initial_Forecast_F(t)'].min())
max_val = max(df_forecast_eval['Y(t)'].max(), df_forecast_eval['Initial_Forecast_F(t)'].max())
axes[0].plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect Prediction')
axes[0].set_xlabel('Actual Price (USD)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Predicted Price (USD)', fontsize=12, fontweight='bold')
axes[0].set_title(f'Initial Forecast\nMAE: ${initial_mae:.2f}, RMSE: ${initial_rmse:.2f}',
fontsize=13, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3)
# Scatter - Final
axes[1].scatter(df_forecast_eval['Y(t)'], df_forecast_eval['Final_Forecast_F^(t)'],
alpha=0.5, s=30, c='red', edgecolors='black', linewidths=0.5)
min_val = min(df_forecast_eval['Y(t)'].min(), df_forecast_eval['Final_Forecast_F^(t)'].min())
max_val = max(df_forecast_eval['Y(t)'].max(), df_forecast_eval['Final_Forecast_F^(t)'].max())
axes[1].plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect Prediction')
axes[1].set_xlabel('Actual Price (USD)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Predicted Price (USD)', fontsize=12, fontweight='bold')
axes[1].set_title(f'Final Forecast (Adjusted)\nMAE: ${final_mae:.2f}, RMSE: ${final_rmse:.2f}',
fontsize=13, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3)
plt.suptitle('Actual vs Predicted Prices: Initial vs Final Forecast', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()
In [67]:
# Plot 5: Error Magnitude Comparison
plt.figure(figsize=(18, 7))
abs_errors_initial = np.abs(errors_initial)
abs_errors_final = np.abs(errors_final)
x_range = range(len(abs_errors_initial))
plt.plot(x_range, abs_errors_initial, label='Initial Forecast Error',
color='orange', linewidth=1.5, alpha=0.7)
plt.plot(x_range, abs_errors_final, label='Final Forecast Error',
color='red', linewidth=1.5, alpha=0.7)
plt.axhline(y=initial_mae, color='orange', linestyle='--', linewidth=1.5,
label=f'Initial MAE: ${initial_mae:.2f}')
plt.axhline(y=final_mae, color='red', linestyle='--', linewidth=1.5,
label=f'Final MAE: ${final_mae:.2f}')
plt.xlabel('Observation Index', fontsize=12, fontweight='bold')
plt.ylabel('Absolute Error (USD)', fontsize=12, fontweight='bold')
plt.title('Absolute Forecast Errors: Initial vs Final', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
In [68]:
# ============================================================================
# STEP 8: FORECASTING ON TESTING DATA (ONE-STEP-AHEAD ITERATIVELY)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 8: Forecasting on Testing Data (One-Step-Ahead)")
print("="*100)
print("\n⚠️ IMPORTANT: Testing forecast is done ONE STEP at a time")
print(" because we need actual Y(t) to calculate diff(Y(t)) for next prediction.")
print("-"*100)
# Assign fuzzy states to test data first
print("\n🔄 Assigning fuzzy states to testing data...")
# Calculate membership for test data
n_test = len(df_test)
membership_matrix_test = np.zeros((n_test, k_optimal))
for i in range(k_optimal):
params = df_fuzzy_sets.iloc[i]
membership_matrix_test[:, i] = trapezoid_membership(
df_test['Price'].values,
params['a'], params['b'], params['c'], params['d']
)
# Add membership columns
for i in range(k_optimal):
df_test[f'mu_A{i}'] = membership_matrix_test[:, i]
# Assign fuzzy states to test data
fuzzy_states_test = []
max_memberships_test = []
for idx in df_test.index:
row = df_test.loc[idx]
state, max_mu = assign_fuzzy_state(row, k_optimal)
fuzzy_states_test.append(state)
max_memberships_test.append(max_mu)
df_test['Fuzzy_State'] = fuzzy_states_test
df_test['Max_Membership'] = max_memberships_test
print("✅ Fuzzy states assigned to testing data!")
# Now perform one-step-ahead forecasting
print("\n🔄 Performing one-step-ahead forecasting on testing data...")
print("-"*100)
forecast_results_test = []
# Get last training data point for initial forecast
last_train_price = df_train.iloc[-1]['Price']
last_train_state = df_train.iloc[-1]['Fuzzy_State']
second_last_train_price = df_train.iloc[-2]['Price']
for i in range(len(df_test)):
current_idx = df_test.index[i]
current_price = df_test.loc[current_idx, 'Price']
if i == 0:
# First test point - use last training data
prev_price = last_train_price
prev_state = last_train_state
diff_yt_minus1 = last_train_price - second_last_train_price
else:
# Use previous test point
prev_idx = df_test.index[i-1]
prev_price = df_test.loc[prev_idx, 'Price']
prev_state = df_test.loc[prev_idx, 'Fuzzy_State']
# Calculate diff from previous step
if i == 1:
diff_yt_minus1 = df_test.iloc[0]['Price'] - last_train_price
else:
diff_yt_minus1 = df_test.iloc[i-1]['Price'] - df_test.iloc[i-2]['Price']
# Calculate diff(Y(t)) for current step (will be used for NEXT prediction)
diff_yt = current_price - prev_price
# Initial forecast F(t)
initial_forecast = forecast_ftsmc_initial(prev_state, prev_price, transition_matrix, midpoints)
# Final forecast F^(t) = F(t) + diff(Y(t-1))
final_forecast = initial_forecast + diff_yt_minus1
forecast_results_test.append({
'Index': i,
'Date': df_test.loc[current_idx, 'Date'],
'Y(t)': current_price,
'Y(t-1)': prev_price,
'diff(Y(t))': diff_yt,
'diff(Y(t-1))': diff_yt_minus1,
'Fuzzy_State_t-1': f'A{prev_state}',
'Initial_Forecast_F(t)': initial_forecast,
'Final_Forecast_F^(t)': final_forecast
})
df_forecast_test = pd.DataFrame(forecast_results_test)
print("\n✅ Testing Forecasting Complete!")
print(f"\n📋 Sample Test Forecast Results (first 30 rows):")
print(df_forecast_test.head(30).to_string(index=False, float_format=lambda x: f'{x:.2f}' if pd.notna(x) else 'NaN'))
print(f"\n📋 Sample Test Forecast Results (last 20 rows):")
print(df_forecast_test.tail(20).to_string(index=False, float_format=lambda x: f'{x:.2f}' if pd.notna(x) else 'NaN'))
# Show step-by-step calculation for first 5 predictions
print("\n" + "="*100)
print("🔍 DETAILED STEP-BY-STEP CALCULATION (First 5 Test Predictions)")
print("="*100)
for i in range(min(5, len(df_forecast_test))):
row = df_forecast_test.iloc[i]
print(f"\n{'─'*100}")
print(f"Prediction {i+1}: {row['Date']}")
print(f"{'─'*100}")
print(f" Step 1: Previous Information")
print(f" • Y(t-1) = ${row['Y(t-1)']:.2f}")
print(f" • Fuzzy State(t-1) = {row['Fuzzy_State_t-1']}")
print(f" • diff(Y(t-1)) = ${row['diff(Y(t-1))']:.2f}")
print(f"\n Step 2: Initial Forecast")
print(f" • F(t) = Weighted forecast based on transition probabilities")
# Show transition probabilities
state_num = int(row['Fuzzy_State_t-1'][1])
probs = transition_matrix[state_num]
print(f" • Transition probabilities from {row['Fuzzy_State_t-1']}:")
for j in range(k_optimal):
if probs[j] > 0:
print(f" - P({row['Fuzzy_State_t-1']} → A{j}) = {probs[j]:.4f} × ${midpoints[j]:.2f}")
print(f" • F(t) = ${row['Initial_Forecast_F(t)']:.2f}")
print(f"\n Step 3: Final Forecast (Adjusted)")
print(f" • F^(t) = F(t) + diff(Y(t-1))")
print(f" • F^(t) = ${row['Initial_Forecast_F(t)']:.2f} + ${row['diff(Y(t-1))']:.2f}")
print(f" • F^(t) = ${row['Final_Forecast_F^(t)']:.2f}")
print(f"\n Step 4: Actual Value & Error")
print(f" • Actual Y(t) = ${row['Y(t)']:.2f}")
print(f" • Initial Error = ${row['Y(t)'] - row['Initial_Forecast_F(t)']:.2f}")
print(f" • Final Error = ${row['Y(t)'] - row['Final_Forecast_F^(t)']:.2f}")
print(f" • diff(Y(t)) = ${row['diff(Y(t))']:.2f} (for next prediction)")
====================================================================================================
📊 STEP 8: Forecasting on Testing Data (One-Step-Ahead)
====================================================================================================
⚠️ IMPORTANT: Testing forecast is done ONE STEP at a time
because we need actual Y(t) to calculate diff(Y(t)) for next prediction.
----------------------------------------------------------------------------------------------------
🔄 Assigning fuzzy states to testing data...
✅ Fuzzy states assigned to testing data!
🔄 Performing one-step-ahead forecasting on testing data...
----------------------------------------------------------------------------------------------------
✅ Testing Forecasting Complete!
📋 Sample Test Forecast Results (first 30 rows):
Index Date Y(t) Y(t-1) diff(Y(t)) diff(Y(t-1)) Fuzzy_State_t-1 Initial_Forecast_F(t) Final_Forecast_F^(t)
0 2024-07-31 80.72 78.63 2.09 -1.15 A2 78.41 77.26
1 2024-08-01 79.52 80.72 -1.20 2.09 A3 80.85 82.94
2 2024-08-02 76.81 79.52 -2.71 -1.20 A2 79.23 78.03
3 2024-08-05 76.30 76.81 -0.51 -2.71 A2 76.74 74.03
4 2024-08-06 76.48 76.30 0.18 -0.51 A2 76.27 75.76
5 2024-08-07 78.33 76.48 1.85 0.18 A2 76.43 76.61
6 2024-08-08 79.16 78.33 0.83 1.85 A2 78.14 79.99
7 2024-08-09 79.66 79.16 0.50 0.83 A2 78.90 79.73
8 2024-08-12 82.30 79.66 2.64 0.50 A2 79.36 79.86
9 2024-08-13 80.69 82.30 -1.61 2.64 A3 82.29 84.93
10 2024-08-14 79.76 80.69 -0.93 -1.61 A3 80.82 79.21
11 2024-08-15 81.04 79.76 1.28 -0.93 A3 79.97 79.04
12 2024-08-16 79.68 81.04 -1.36 1.28 A3 81.14 82.42
13 2024-08-19 77.66 79.68 -2.02 -1.36 A2 79.38 78.02
14 2024-08-20 77.20 77.66 -0.46 -2.02 A2 77.52 75.50
15 2024-08-21 76.05 77.20 -1.15 -0.46 A2 77.10 76.64
16 2024-08-22 77.22 76.05 1.17 -1.15 A2 76.04 74.89
17 2024-08-23 79.02 77.22 1.80 1.17 A2 77.12 78.29
18 2024-08-26 81.43 79.02 2.41 1.80 A2 78.77 80.57
19 2024-08-27 79.55 81.43 -1.88 2.41 A3 81.49 83.90
20 2024-08-28 78.65 79.55 -0.90 -1.88 A2 79.26 77.38
21 2024-08-29 79.94 78.65 1.29 -0.90 A2 78.43 77.53
22 2024-08-30 78.80 79.94 -1.14 1.29 A3 80.13 81.42
23 2024-09-02 77.52 78.80 -1.28 -1.14 A2 78.57 77.43
24 2024-09-03 73.75 77.52 -3.77 -1.28 A2 77.39 76.11
25 2024-09-04 72.70 73.75 -1.05 -3.77 A2 73.92 70.15
26 2024-09-05 72.69 72.70 -0.01 -1.05 A2 72.95 71.90
27 2024-09-06 71.06 72.69 -1.63 -0.01 A2 72.94 72.93
28 2024-09-09 71.84 71.06 0.78 -1.63 A2 71.44 69.81
29 2024-09-10 69.19 71.84 -2.65 0.78 A2 72.16 72.94
📋 Sample Test Forecast Results (last 20 rows):
Index Date Y(t) Y(t-1) diff(Y(t)) diff(Y(t-1)) Fuzzy_State_t-1 Initial_Forecast_F(t) Final_Forecast_F^(t)
239 2025-07-04 68.30 68.80 -0.50 -0.31 A2 69.36 69.05
240 2025-07-07 69.58 68.30 1.28 -0.50 A2 68.89 68.39
241 2025-07-08 70.15 69.58 0.57 1.28 A2 70.07 71.35
242 2025-07-09 70.19 70.15 0.04 0.57 A2 70.60 71.17
243 2025-07-10 68.64 70.19 -1.55 0.04 A2 70.64 70.68
244 2025-07-11 70.36 68.64 1.72 -1.55 A2 69.21 67.66
245 2025-07-14 69.21 70.36 -1.15 1.72 A2 70.79 72.51
246 2025-07-15 68.71 69.21 -0.50 -1.15 A2 69.73 68.58
247 2025-07-16 68.52 68.71 -0.19 -0.50 A2 69.27 68.77
248 2025-07-17 69.52 68.52 1.00 -0.19 A2 69.10 68.91
249 2025-07-18 69.28 69.52 -0.24 1.00 A2 70.02 71.02
250 2025-07-21 68.38 69.28 -0.90 -0.24 A2 69.80 69.56
251 2025-07-22 67.77 68.38 -0.61 -0.90 A2 68.97 68.07
252 2025-07-23 67.83 67.77 0.06 -0.61 A2 68.41 67.80
253 2025-07-24 68.36 67.83 0.53 0.06 A2 68.46 68.52
254 2025-07-25 68.44 68.36 0.08 0.53 A2 68.95 69.48
255 2025-07-28 70.04 68.44 1.60 0.08 A2 69.02 69.10
256 2025-07-29 72.51 70.04 2.47 1.60 A2 70.50 72.10
257 2025-07-30 73.24 72.51 0.73 2.47 A2 72.77 75.24
258 2025-07-31 72.53 73.24 -0.71 0.73 A2 73.45 74.18
====================================================================================================
🔍 DETAILED STEP-BY-STEP CALCULATION (First 5 Test Predictions)
====================================================================================================
────────────────────────────────────────────────────────────────────────────────────────────────────
Prediction 1: 2024-07-31 00:00:00
────────────────────────────────────────────────────────────────────────────────────────────────────
Step 1: Previous Information
• Y(t-1) = $78.63
• Fuzzy State(t-1) = A2
• diff(Y(t-1)) = $-1.15
Step 2: Initial Forecast
• F(t) = Weighted forecast based on transition probabilities
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78
- P(A2 → A2) = 0.9216 × $73.85
- P(A2 → A3) = 0.0556 × $84.16
• F(t) = $78.41
Step 3: Final Forecast (Adjusted)
• F^(t) = F(t) + diff(Y(t-1))
• F^(t) = $78.41 + $-1.15
• F^(t) = $77.26
Step 4: Actual Value & Error
• Actual Y(t) = $80.72
• Initial Error = $2.31
• Final Error = $3.46
• diff(Y(t)) = $2.09 (for next prediction)
────────────────────────────────────────────────────────────────────────────────────────────────────
Prediction 2: 2024-08-01 00:00:00
────────────────────────────────────────────────────────────────────────────────────────────────────
Step 1: Previous Information
• Y(t-1) = $80.72
• Fuzzy State(t-1) = A3
• diff(Y(t-1)) = $2.09
Step 2: Initial Forecast
• F(t) = Weighted forecast based on transition probabilities
• Transition probabilities from A3:
- P(A3 → A2) = 0.0509 × $73.85
- P(A3 → A3) = 0.9132 × $84.16
- P(A3 → A4) = 0.0359 × $93.95
• F(t) = $80.85
Step 3: Final Forecast (Adjusted)
• F^(t) = F(t) + diff(Y(t-1))
• F^(t) = $80.85 + $2.09
• F^(t) = $82.94
Step 4: Actual Value & Error
• Actual Y(t) = $79.52
• Initial Error = $-1.33
• Final Error = $-3.42
• diff(Y(t)) = $-1.20 (for next prediction)
────────────────────────────────────────────────────────────────────────────────────────────────────
Prediction 3: 2024-08-02 00:00:00
────────────────────────────────────────────────────────────────────────────────────────────────────
Step 1: Previous Information
• Y(t-1) = $79.52
• Fuzzy State(t-1) = A2
• diff(Y(t-1)) = $-1.20
Step 2: Initial Forecast
• F(t) = Weighted forecast based on transition probabilities
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78
- P(A2 → A2) = 0.9216 × $73.85
- P(A2 → A3) = 0.0556 × $84.16
• F(t) = $79.23
Step 3: Final Forecast (Adjusted)
• F^(t) = F(t) + diff(Y(t-1))
• F^(t) = $79.23 + $-1.20
• F^(t) = $78.03
Step 4: Actual Value & Error
• Actual Y(t) = $76.81
• Initial Error = $-2.42
• Final Error = $-1.22
• diff(Y(t)) = $-2.71 (for next prediction)
────────────────────────────────────────────────────────────────────────────────────────────────────
Prediction 4: 2024-08-05 00:00:00
────────────────────────────────────────────────────────────────────────────────────────────────────
Step 1: Previous Information
• Y(t-1) = $76.81
• Fuzzy State(t-1) = A2
• diff(Y(t-1)) = $-2.71
Step 2: Initial Forecast
• F(t) = Weighted forecast based on transition probabilities
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78
- P(A2 → A2) = 0.9216 × $73.85
- P(A2 → A3) = 0.0556 × $84.16
• F(t) = $76.74
Step 3: Final Forecast (Adjusted)
• F^(t) = F(t) + diff(Y(t-1))
• F^(t) = $76.74 + $-2.71
• F^(t) = $74.03
Step 4: Actual Value & Error
• Actual Y(t) = $76.30
• Initial Error = $-0.44
• Final Error = $2.27
• diff(Y(t)) = $-0.51 (for next prediction)
────────────────────────────────────────────────────────────────────────────────────────────────────
Prediction 5: 2024-08-06 00:00:00
────────────────────────────────────────────────────────────────────────────────────────────────────
Step 1: Previous Information
• Y(t-1) = $76.30
• Fuzzy State(t-1) = A2
• diff(Y(t-1)) = $-0.51
Step 2: Initial Forecast
• F(t) = Weighted forecast based on transition probabilities
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78
- P(A2 → A2) = 0.9216 × $73.85
- P(A2 → A3) = 0.0556 × $84.16
• F(t) = $76.27
Step 3: Final Forecast (Adjusted)
• F^(t) = F(t) + diff(Y(t-1))
• F^(t) = $76.27 + $-0.51
• F^(t) = $75.76
Step 4: Actual Value & Error
• Actual Y(t) = $76.48
• Initial Error = $0.21
• Final Error = $0.72
• diff(Y(t)) = $0.18 (for next prediction)
In [69]:
# ============================================================================
# STEP 9: CALCULATE TESTING ACCURACY (INITIAL vs FINAL)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 9: Calculating Testing Accuracy (Initial vs Final)")
print("="*100)
# Actual values
actual_test = df_forecast_test['Y(t)'].values
# Initial forecast values
initial_pred_test = df_forecast_test['Initial_Forecast_F(t)'].values
# Final forecast values
final_pred_test = df_forecast_test['Final_Forecast_F^(t)'].values
# ===== INITIAL FORECAST ACCURACY =====
print("\n" + "="*60)
print("📈 TESTING - INITIAL FORECAST ACCURACY (F(t))")
print("="*60)
initial_mae_test = mean_absolute_error(actual_test, initial_pred_test)
initial_mse_test = mean_squared_error(actual_test, initial_pred_test)
initial_rmse_test = np.sqrt(initial_mse_test)
initial_mape_test = mean_absolute_percentage_error(actual_test, initial_pred_test) * 100
print(f"\n MAE (Mean Absolute Error) : ${initial_mae_test:.4f}")
print(f" MSE (Mean Squared Error) : ${initial_mse_test:.4f}")
print(f" RMSE (Root Mean Squared Error) : ${initial_rmse_test:.4f}")
print(f" MAPE (Mean Absolute Percentage Error): {initial_mape_test:.4f}%")
initial_accuracy_level_test = interpret_accuracy(initial_mae_test)
print(f"\n Accuracy Level: {initial_accuracy_level_test}")
# ===== FINAL FORECAST ACCURACY =====
print("\n" + "="*60)
print("📈 TESTING - FINAL FORECAST ACCURACY (F^(t) - Adjusted)")
print("="*60)
final_mae_test = mean_absolute_error(actual_test, final_pred_test)
final_mse_test = mean_squared_error(actual_test, final_pred_test)
final_rmse_test = np.sqrt(final_mse_test)
final_mape_test = mean_absolute_percentage_error(actual_test, final_pred_test) * 100
print(f"\n MAE (Mean Absolute Error) : ${final_mae_test:.4f}")
print(f" MSE (Mean Squared Error) : ${final_mse_test:.4f}")
print(f" RMSE (Root Mean Squared Error) : ${final_rmse_test:.4f}")
print(f" MAPE (Mean Absolute Percentage Error): {final_mape_test:.4f}%")
final_accuracy_level_test = interpret_accuracy(final_mae_test)
print(f"\n Accuracy Level: {final_accuracy_level_test}")
# ===== COMPARISON =====
print("\n" + "="*60)
print("🔍 TESTING COMPARISON: INITIAL vs FINAL FORECAST")
print("="*60)
comparison_data_test = {
'Metric': ['MAE', 'MSE', 'RMSE', 'MAPE (%)'],
'Initial_F(t)': [initial_mae_test, initial_mse_test, initial_rmse_test, initial_mape_test],
'Final_F^(t)': [final_mae_test, final_mse_test, final_rmse_test, final_mape_test],
'Improvement': [
initial_mae_test - final_mae_test,
initial_mse_test - final_mse_test,
initial_rmse_test - final_rmse_test,
initial_mape_test - final_mape_test
]
}
df_comparison_test = pd.DataFrame(comparison_data_test)
print("\n" + df_comparison_test.to_string(index=False, float_format=lambda x: f'{x:.4f}'))
# Determine which is better
print("\n🏆 TESTING WINNER:")
if final_mae_test < initial_mae_test:
improvement_pct_test = ((initial_mae_test - final_mae_test) / initial_mae_test) * 100
print(f" Final Forecast (Adjusted) is BETTER by {improvement_pct_test:.2f}%")
print(f" ✅ Adjustment with diff(Y(t)) IMPROVES accuracy on testing data!")
else:
degradation_pct_test = ((final_mae_test - initial_mae_test) / initial_mae_test) * 100
print(f" Initial Forecast is BETTER by {degradation_pct_test:.2f}%")
print(f" ⚠️ Adjustment with diff(Y(t)) DECREASES accuracy on testing data!")
# ===== TRAINING vs TESTING COMPARISON =====
print("\n" + "="*100)
print("🔍 OVERALL COMPARISON: TRAINING vs TESTING PERFORMANCE")
print("="*100)
overall_comparison = {
'Dataset': ['Training', 'Training', 'Testing', 'Testing'],
'Forecast_Type': ['Initial F(t)', 'Final F^(t)', 'Initial F(t)', 'Final F^(t)'],
'MAE': [initial_mae, final_mae, initial_mae_test, final_mae_test],
'MSE': [initial_mse, final_mse, initial_mse_test, final_mse_test],
'RMSE': [initial_rmse, final_rmse, initial_rmse_test, final_rmse_test],
'MAPE (%)': [initial_mape, final_mape, initial_mape_test, final_mape_test]
}
df_overall_comparison = pd.DataFrame(overall_comparison)
print("\n" + df_overall_comparison.to_string(index=False, float_format=lambda x: f'{x:.4f}'))
====================================================================================================
📊 STEP 9: Calculating Testing Accuracy (Initial vs Final)
====================================================================================================
============================================================
📈 TESTING - INITIAL FORECAST ACCURACY (F(t))
============================================================
MAE (Mean Absolute Error) : $1.0985
MSE (Mean Squared Error) : $2.1239
RMSE (Root Mean Squared Error) : $1.4574
MAPE (Mean Absolute Percentage Error): 1.5314%
Accuracy Level: EXCELLENT ⭐⭐⭐
============================================================
📈 TESTING - FINAL FORECAST ACCURACY (F^(t) - Adjusted)
============================================================
MAE (Mean Absolute Error) : $1.5766
MSE (Mean Squared Error) : $4.0474
RMSE (Root Mean Squared Error) : $2.0118
MAPE (Mean Absolute Percentage Error): 2.1876%
Accuracy Level: EXCELLENT ⭐⭐⭐
============================================================
🔍 TESTING COMPARISON: INITIAL vs FINAL FORECAST
============================================================
Metric Initial_F(t) Final_F^(t) Improvement
MAE 1.0985 1.5766 -0.4782
MSE 2.1239 4.0474 -1.9235
RMSE 1.4574 2.0118 -0.5544
MAPE (%) 1.5314 2.1876 -0.6562
🏆 TESTING WINNER:
Initial Forecast is BETTER by 43.53%
⚠️ Adjustment with diff(Y(t)) DECREASES accuracy on testing data!
====================================================================================================
🔍 OVERALL COMPARISON: TRAINING vs TESTING PERFORMANCE
====================================================================================================
Dataset Forecast_Type MAE MSE RMSE MAPE (%)
Training Initial F(t) 1.4494 4.1424 2.0353 1.8210
Training Final F^(t) 1.9680 7.6556 2.7669 2.4659
Testing Initial F(t) 1.0985 2.1239 1.4574 1.5314
Testing Final F^(t) 1.5766 4.0474 2.0118 2.1876
In [70]:
# ============================================================================
# STEP 10: VISUALIZE TESTING RESULTS
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 10: Visualizing Testing Results")
print("="*100)
# Plot 1: Testing Data - Actual vs Forecasts
plt.figure(figsize=(20, 7))
plt.plot(df_forecast_test['Date'], df_forecast_test['Y(t)'],
label='Actual Price', color='blue', linewidth=2.5, alpha=0.8, marker='o', markersize=4)
plt.plot(df_forecast_test['Date'], df_forecast_test['Initial_Forecast_F(t)'],
label='Initial Forecast F(t)', color='orange', linewidth=1.8, alpha=0.7,
marker='o', markersize=3)
plt.plot(df_forecast_test['Date'], df_forecast_test['Final_Forecast_F^(t)'],
label='Final Forecast F^(t) (Adjusted)', color='red', linewidth=1.8, alpha=0.7,
marker='o', markersize=3)
plt.xlabel('Date', fontsize=13, fontweight='bold')
plt.ylabel('Price (USD)', fontsize=13, fontweight='bold')
plt.title('Brent Oil Price: Testing Data - Actual vs FTSMC Forecasts\nInitial F(t) vs Final F^(t)',
fontsize=15, fontweight='bold')
plt.legend(fontsize=11, loc='upper left')
plt.grid(True, alpha=0.3, linestyle='--')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
==================================================================================================== 📊 STEP 10: Visualizing Testing Results ====================================================================================================
In [71]:
# Plot 2: Testing Errors
errors_initial_test = df_forecast_test['Y(t)'] - df_forecast_test['Initial_Forecast_F(t)']
errors_final_test = df_forecast_test['Y(t)'] - df_forecast_test['Final_Forecast_F^(t)']
fig, axes = plt.subplots(2, 2, figsize=(20, 12))
# Error over time - Initial
axes[0, 0].plot(df_forecast_test['Date'], errors_initial_test, color='orange', linewidth=1.5, alpha=0.7)
axes[0, 0].axhline(y=0, color='black', linestyle='--', linewidth=1.5)
axes[0, 0].fill_between(df_forecast_test['Date'], errors_initial_test, 0,
where=(errors_initial_test > 0), color='green', alpha=0.3, label='Overestimate')
axes[0, 0].fill_between(df_forecast_test['Date'], errors_initial_test, 0,
where=(errors_initial_test < 0), color='red', alpha=0.3, label='Underestimate')
axes[0, 0].set_xlabel('Date', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[0, 0].set_title('Testing: Initial Forecast Errors Over Time', fontsize=13, fontweight='bold')
axes[0, 0].legend(fontsize=10)
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].tick_params(axis='x', rotation=45)
# Error over time - Final
axes[0, 1].plot(df_forecast_test['Date'], errors_final_test, color='red', linewidth=1.5, alpha=0.7)
axes[0, 1].axhline(y=0, color='black', linestyle='--', linewidth=1.5)
axes[0, 1].fill_between(df_forecast_test['Date'], errors_final_test, 0,
where=(errors_final_test > 0), color='green', alpha=0.3, label='Overestimate')
axes[0, 1].fill_between(df_forecast_test['Date'], errors_final_test, 0,
where=(errors_final_test < 0), color='red', alpha=0.3, label='Underestimate')
axes[0, 1].set_xlabel('Date', fontsize=12, fontweight='bold')
axes[0, 1].set_ylabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[0, 1].set_title('Testing: Final Forecast Errors Over Time', fontsize=13, fontweight='bold')
axes[0, 1].legend(fontsize=10)
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].tick_params(axis='x', rotation=45)
# Error distribution - Initial
axes[1, 0].hist(errors_initial_test, bins=30, color='orange', alpha=0.7, edgecolor='black')
axes[1, 0].axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero Error')
axes[1, 0].axvline(x=errors_initial_test.mean(), color='green', linestyle='--', linewidth=2,
label=f'Mean Error: ${errors_initial_test.mean():.2f}')
axes[1, 0].set_xlabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[1, 0].set_ylabel('Frequency', fontsize=12, fontweight='bold')
axes[1, 0].set_title('Testing: Distribution of Initial Forecast Errors', fontsize=13, fontweight='bold')
axes[1, 0].legend(fontsize=10)
axes[1, 0].grid(True, alpha=0.3, axis='y')
# Error distribution - Final
axes[1, 1].hist(errors_final_test, bins=30, color='red', alpha=0.7, edgecolor='black')
axes[1, 1].axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero Error')
axes[1, 1].axvline(x=errors_final_test.mean(), color='green', linestyle='--', linewidth=2,
label=f'Mean Error: ${errors_final_test.mean():.2f}')
axes[1, 1].set_xlabel('Forecast Error (USD)', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Frequency', fontsize=12, fontweight='bold')
axes[1, 1].set_title('Testing: Distribution of Final Forecast Errors', fontsize=13, fontweight='bold')
axes[1, 1].legend(fontsize=10)
axes[1, 1].grid(True, alpha=0.3, axis='y')
plt.suptitle('Testing Forecast Errors: Initial vs Final', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()
In [72]:
# Plot 3: Scatter Plots for Testing
fig, axes = plt.subplots(1, 2, figsize=(18, 8))
# Scatter - Initial
axes[0].scatter(df_forecast_test['Y(t)'], df_forecast_test['Initial_Forecast_F(t)'],
alpha=0.6, s=40, c='orange', edgecolors='black', linewidths=0.5)
min_val = min(df_forecast_test['Y(t)'].min(), df_forecast_test['Initial_Forecast_F(t)'].min())
max_val = max(df_forecast_test['Y(t)'].max(), df_forecast_test['Initial_Forecast_F(t)'].max())
axes[0].plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect Prediction')
axes[0].set_xlabel('Actual Price (USD)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Predicted Price (USD)', fontsize=12, fontweight='bold')
axes[0].set_title(f'Testing: Initial Forecast\nMAE: ${initial_mae_test:.2f}, RMSE: ${initial_rmse_test:.2f}',
fontsize=13, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3)
# Scatter - Final
axes[1].scatter(df_forecast_test['Y(t)'], df_forecast_test['Final_Forecast_F^(t)'],
alpha=0.6, s=40, c='red', edgecolors='black', linewidths=0.5)
min_val = min(df_forecast_test['Y(t)'].min(), df_forecast_test['Final_Forecast_F^(t)'].min())
max_val = max(df_forecast_test['Y(t)'].max(), df_forecast_test['Final_Forecast_F^(t)'].max())
axes[1].plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect Prediction')
axes[1].set_xlabel('Actual Price (USD)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Predicted Price (USD)', fontsize=12, fontweight='bold')
axes[1].set_title(f'Testing: Final Forecast (Adjusted)\nMAE: ${final_mae_test:.2f}, RMSE: ${final_rmse_test:.2f}',
fontsize=13, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3)
plt.suptitle('Testing: Actual vs Predicted Prices', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()
In [73]:
# Plot 4: Combined Training + Testing Visualization
print("\n🔄 Creating combined Training + Testing visualization...")
# Prepare combined data
df_train_plot = df_forecast_train.copy()
df_train_plot['Dataset'] = 'Training'
df_test_plot = df_forecast_test.copy()
df_test_plot['Dataset'] = 'Testing'
# Plot Combined
fig, axes = plt.subplots(2, 1, figsize=(22, 12))
# Subplot 1: Initial Forecast
axes[0].plot(df_train_plot['Date'], df_train_plot['Y(t)'],
label='Actual (Training)', color='blue', linewidth=2, alpha=0.7)
axes[0].plot(df_test_plot['Date'], df_test_plot['Y(t)'],
label='Actual (Testing)', color='darkblue', linewidth=2, alpha=0.7)
axes[0].plot(df_train_plot['Date'], df_train_plot['Initial_Forecast_F(t)'],
label='Initial Forecast (Training)', color='orange', linewidth=1.5, alpha=0.6)
axes[0].plot(df_test_plot['Date'], df_test_plot['Initial_Forecast_F(t)'],
label='Initial Forecast (Testing)', color='darkorange', linewidth=1.5, alpha=0.6)
# Add vertical line to separate training and testing
split_date = df_test_plot['Date'].iloc[0]
axes[0].axvline(split_date, color='red', linestyle='-.', linewidth=2.5, alpha=0.7, label='Train/Test Split')
axes[0].set_xlabel('Date', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Price (USD)', fontsize=13, fontweight='bold')
axes[0].set_title('Complete Dataset: Initial Forecast F(t)\nTraining + Testing',
fontsize=15, fontweight='bold')
axes[0].legend(fontsize=10, loc='upper left')
axes[0].grid(True, alpha=0.3, linestyle='--')
axes[0].tick_params(axis='x', rotation=45)
# Subplot 2: Final Forecast
axes[1].plot(df_train_plot['Date'], df_train_plot['Y(t)'],
label='Actual (Training)', color='blue', linewidth=2, alpha=0.7)
axes[1].plot(df_test_plot['Date'], df_test_plot['Y(t)'],
label='Actual (Testing)', color='darkblue', linewidth=2, alpha=0.7)
axes[1].plot(df_train_plot['Date'], df_train_plot['Final_Forecast_F^(t)'],
label='Final Forecast (Training)', color='red', linewidth=1.5, alpha=0.6)
axes[1].plot(df_test_plot['Date'], df_test_plot['Final_Forecast_F^(t)'],
label='Final Forecast (Testing)', color='darkred', linewidth=1.5, alpha=0.6)
# Add vertical line
axes[1].axvline(split_date, color='red', linestyle='-.', linewidth=2.5, alpha=0.7, label='Train/Test Split')
axes[1].set_xlabel('Date', fontsize=13, fontweight='bold')
axes[1].set_ylabel('Price (USD)', fontsize=13, fontweight='bold')
axes[1].set_title('Complete Dataset: Final Forecast F^(t) (Adjusted)\nTraining + Testing',
fontsize=15, fontweight='bold')
axes[1].legend(fontsize=10, loc='upper left')
axes[1].grid(True, alpha=0.3, linestyle='--')
axes[1].tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
🔄 Creating combined Training + Testing visualization...
In [74]:
# Plot 5: Side-by-side comparison of errors
fig, axes = plt.subplots(1, 2, figsize=(20, 7))
# Training errors
axes[0].plot(df_forecast_eval['Date'], np.abs(errors_initial),
label=f'Initial (MAE: ${initial_mae:.2f})', color='orange', linewidth=1.5, alpha=0.7)
axes[0].plot(df_forecast_eval['Date'], np.abs(errors_final),
label=f'Final (MAE: ${final_mae:.2f})', color='red', linewidth=1.5, alpha=0.7)
axes[0].axhline(y=initial_mae, color='orange', linestyle='--', linewidth=1.5, alpha=0.5)
axes[0].axhline(y=final_mae, color='red', linestyle='--', linewidth=1.5, alpha=0.5)
axes[0].set_xlabel('Date', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Absolute Error (USD)', fontsize=12, fontweight='bold')
axes[0].set_title('Training: Absolute Errors', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)
axes[0].tick_params(axis='x', rotation=45)
# Testing errors
axes[1].plot(df_forecast_test['Date'], np.abs(errors_initial_test),
label=f'Initial (MAE: ${initial_mae_test:.2f})', color='orange', linewidth=1.5, alpha=0.7)
axes[1].plot(df_forecast_test['Date'], np.abs(errors_final_test),
label=f'Final (MAE: ${final_mae_test:.2f})', color='red', linewidth=1.5, alpha=0.7)
axes[1].axhline(y=initial_mae_test, color='orange', linestyle='--', linewidth=1.5, alpha=0.5)
axes[1].axhline(y=final_mae_test, color='red', linestyle='--', linewidth=1.5, alpha=0.5)
axes[1].set_xlabel('Date', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Absolute Error (USD)', fontsize=12, fontweight='bold')
axes[1].set_title('Testing: Absolute Errors', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)
axes[1].tick_params(axis='x', rotation=45)
plt.suptitle('Absolute Forecast Errors: Training vs Testing', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()
In [75]:
# ============================================================================
# STEP 11: ONE-STEP-AHEAD FUTURE FORECAST
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 11: One-Step-Ahead Future Forecast (Beyond Test Data)")
print("="*100)
print("\n⚠️ IMPORTANT: We can only forecast ONE STEP into the future")
print(" because we need actual Y(t) to calculate diff(Y(t)) for subsequent predictions.")
print("-"*100)
# Get last test data information
last_test_price = df_test.iloc[-1]['Price']
last_test_state = df_test.iloc[-1]['Fuzzy_State']
second_last_test_price = df_test.iloc[-2]['Price']
last_test_date = df_test.iloc[-1]['Date']
# Calculate diff for adjustment
diff_last = last_test_price - second_last_test_price
print(f"\n📋 Last Known Information:")
print(f" • Last Date : {last_test_date}")
print(f" • Last Price Y(t) : ${last_test_price:.2f}")
print(f" • Second Last Price Y(t-1): ${second_last_test_price:.2f}")
print(f" • Last Fuzzy State : A{last_test_state}")
print(f" • diff(Y(t)) : ${diff_last:.2f}")
# Perform one-step-ahead forecast
print(f"\n🔮 Forecasting Next Step...")
print("-"*100)
# Initial forecast
future_initial_forecast = forecast_ftsmc_initial(last_test_state, last_test_price,
transition_matrix, midpoints)
# Final forecast (adjusted)
future_final_forecast = future_initial_forecast + diff_last
# Next date (assuming daily data)
from datetime import timedelta
next_date = last_test_date + timedelta(days=1)
print(f"\n✅ Future Forecast Complete!")
print(f"\n{'='*100}")
print(f"🔮 FORECAST FOR {next_date}")
print(f"{'='*100}")
print(f"\n📊 Detailed Calculation:")
print(f"{'─'*100}")
print(f"\nStep 1: Current State Information")
print(f" • Current Price Y(t) : ${last_test_price:.2f}")
print(f" • Current Fuzzy State : A{last_test_state}")
print(f" • Medoid of A{last_test_state} : ${midpoints[last_test_state]:.2f}")
print(f"\nStep 2: Transition Probabilities from A{last_test_state}")
probs = transition_matrix[last_test_state]
for j in range(k_optimal):
if probs[j] > 0:
print(f" • P(A{last_test_state} → A{j}) = {probs[j]:.4f}")
print(f"\nStep 3: Initial Forecast F(t+1)")
print(f" • Formula: Weighted sum based on transition probabilities")
forecast_calculation = []
for j in range(k_optimal):
if probs[j] > 0:
if j == last_test_state:
contrib = last_test_price * probs[j]
forecast_calculation.append(f"${last_test_price:.2f} × {probs[j]:.4f}")
else:
contrib = midpoints[j] * probs[j]
forecast_calculation.append(f"${midpoints[j]:.2f} × {probs[j]:.4f}")
print(f" • F(t+1) = " + " + ".join(forecast_calculation))
print(f" • F(t+1) = ${future_initial_forecast:.2f}")
print(f"\nStep 4: Adjustment with diff(Y(t))")
print(f" • diff(Y(t)) = Y(t) - Y(t-1)")
print(f" • diff(Y(t)) = ${last_test_price:.2f} - ${second_last_test_price:.2f}")
print(f" • diff(Y(t)) = ${diff_last:.2f}")
print(f"\nStep 5: Final Forecast F^(t+1)")
print(f" • F^(t+1) = F(t+1) + diff(Y(t))")
print(f" • F^(t+1) = ${future_initial_forecast:.2f} + ${diff_last:.2f}")
print(f" • F^(t+1) = ${future_final_forecast:.2f}")
print(f"\n{'='*100}")
print(f"📈 FINAL PREDICTION FOR {next_date}:")
print(f"{'='*100}")
print(f"\n 🔸 Initial Forecast F(t+1) : ${future_initial_forecast:.2f}")
print(f" 🔸 Final Forecast F^(t+1) : ${future_final_forecast:.2f}")
print(f"\n ✅ RECOMMENDED FORECAST: ${future_final_forecast:.2f}")
print(f"{'='*100}")
# Create future forecast DataFrame
future_forecast_data = {
'Date': [next_date],
'Forecast_Type': ['One-Step-Ahead'],
'Previous_Y(t)': [last_test_price],
'Previous_State': [f'A{last_test_state}'],
'diff(Y(t))': [diff_last],
'Initial_Forecast_F(t+1)': [future_initial_forecast],
'Final_Forecast_F^(t+1)': [future_final_forecast]
}
df_future_forecast = pd.DataFrame(future_forecast_data)
# Visualization of future forecast
print(f"\n📊 Visualizing Future Forecast...")
plt.figure(figsize=(20, 8))
# Plot last 50 test points + future forecast
n_display = min(50, len(df_test))
display_test = df_test.tail(n_display)
plt.plot(display_test['Date'], display_test['Price'],
'o-', label='Actual Test Data', color='blue', linewidth=2.5, markersize=6)
# Add future forecast point
plt.plot(next_date, future_final_forecast,
'*', label=f'Future Forecast (${future_final_forecast:.2f})',
color='red', markersize=25, markeredgecolor='black', markeredgewidth=2)
# Add forecast uncertainty band (optional - based on RMSE)
plt.axhspan(future_final_forecast - final_rmse_test,
future_final_forecast + final_rmse_test,
xmin=0.9, alpha=0.2, color='red',
label=f'Uncertainty Band (±RMSE: ${final_rmse_test:.2f})')
# Vertical line at last known data
plt.axvline(last_test_date, color='green', linestyle='--', linewidth=2,
alpha=0.7, label='Last Known Data')
plt.xlabel('Date', fontsize=13, fontweight='bold')
plt.ylabel('Price (USD)', fontsize=13, fontweight='bold')
plt.title(f'One-Step-Ahead Forecast for {next_date}\nBrent Oil Price Prediction',
fontsize=15, fontweight='bold')
plt.legend(fontsize=11, loc='upper left')
plt.grid(True, alpha=0.3, linestyle='--')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
==================================================================================================== 📊 STEP 11: One-Step-Ahead Future Forecast (Beyond Test Data) ==================================================================================================== ⚠️ IMPORTANT: We can only forecast ONE STEP into the future because we need actual Y(t) to calculate diff(Y(t)) for subsequent predictions. ---------------------------------------------------------------------------------------------------- 📋 Last Known Information: • Last Date : 2025-07-31 00:00:00 • Last Price Y(t) : $72.53 • Second Last Price Y(t-1): $73.24 • Last Fuzzy State : A2 • diff(Y(t)) : $-0.71 🔮 Forecasting Next Step... ---------------------------------------------------------------------------------------------------- ✅ Future Forecast Complete! ==================================================================================================== 🔮 FORECAST FOR 2025-08-01 00:00:00 ==================================================================================================== 📊 Detailed Calculation: ──────────────────────────────────────────────────────────────────────────────────────────────────── Step 1: Current State Information • Current Price Y(t) : $72.53 • Current Fuzzy State : A2 • Medoid of A2 : $73.85 Step 2: Transition Probabilities from A2 • P(A2 → A1) = 0.0229 • P(A2 → A2) = 0.9216 • P(A2 → A3) = 0.0556 Step 3: Initial Forecast F(t+1) • Formula: Weighted sum based on transition probabilities • F(t+1) = $55.78 × 0.0229 + $72.53 × 0.9216 + $84.16 × 0.0556 • F(t+1) = $72.79 Step 4: Adjustment with diff(Y(t)) • diff(Y(t)) = Y(t) - Y(t-1) • diff(Y(t)) = $72.53 - $73.24 • diff(Y(t)) = $-0.71 Step 5: Final Forecast F^(t+1) • F^(t+1) = F(t+1) + diff(Y(t)) • F^(t+1) = $72.79 + $-0.71 • F^(t+1) = $72.08 ==================================================================================================== 📈 FINAL PREDICTION FOR 2025-08-01 00:00:00: ==================================================================================================== 🔸 Initial Forecast F(t+1) : $72.79 🔸 Final Forecast F^(t+1) : $72.08 ✅ RECOMMENDED FORECAST: $72.08 ==================================================================================================== 📊 Visualizing Future Forecast...
In [76]:
# ============================================================================
# STEP 12: SAVE ALL RESULTS
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 12: Saving All Results")
print("="*100)
print("\n💾 Saving forecast results to files...")
# Save training forecast
df_forecast_train.to_csv('ftsmc_training_forecast.csv', index=False)
print(" ✓ Training forecast saved to 'ftsmc_training_forecast.csv'")
# Save testing forecast
df_forecast_test.to_csv('ftsmc_testing_forecast.csv', index=False)
print(" ✓ Testing forecast saved to 'ftsmc_testing_forecast.csv'")
# Save future forecast
df_future_forecast.to_csv('ftsmc_future_forecast.csv', index=False)
print(" ✓ Future forecast saved to 'ftsmc_future_forecast.csv'")
# Save transition matrix
transition_df.to_csv('ftsmc_transition_matrix.csv')
print(" ✓ Transition matrix saved to 'ftsmc_transition_matrix.csv'")
# Save fuzzy sets parameters
df_fuzzy_sets.to_csv('ftsmc_fuzzy_sets.csv', index=False)
print(" ✓ Fuzzy sets parameters saved to 'ftsmc_fuzzy_sets.csv'")
# Save FLRG
df_flrg.to_csv('ftsmc_flrg.csv', index=False)
print(" ✓ FLRG saved to 'ftsmc_flrg.csv'")
# Save accuracy comparison
accuracy_summary = {
'Dataset': ['Training', 'Training', 'Testing', 'Testing'],
'Forecast_Type': ['Initial F(t)', 'Final F^(t)', 'Initial F(t)', 'Final F^(t)'],
'MAE': [initial_mae, final_mae, initial_mae_test, final_mae_test],
'MSE': [initial_mse, final_mse, initial_mse_test, final_mse_test],
'RMSE': [initial_rmse, final_rmse, initial_rmse_test, final_rmse_test],
'MAPE': [initial_mape, final_mape, initial_mape_test, final_mape_test],
'Accuracy_Level': [initial_accuracy_level, final_accuracy_level,
initial_accuracy_level_test, final_accuracy_level_test]
}
df_accuracy_summary = pd.DataFrame(accuracy_summary)
df_accuracy_summary.to_csv('ftsmc_accuracy_summary.csv', index=False)
print(" ✓ Accuracy summary saved to 'ftsmc_accuracy_summary.csv'")
print("\n✅ All results saved successfully!")
==================================================================================================== 📊 STEP 12: Saving All Results ==================================================================================================== 💾 Saving forecast results to files... ✓ Training forecast saved to 'ftsmc_training_forecast.csv' ✓ Testing forecast saved to 'ftsmc_testing_forecast.csv' ✓ Future forecast saved to 'ftsmc_future_forecast.csv' ✓ Transition matrix saved to 'ftsmc_transition_matrix.csv' ✓ Fuzzy sets parameters saved to 'ftsmc_fuzzy_sets.csv' ✓ FLRG saved to 'ftsmc_flrg.csv' ✓ Accuracy summary saved to 'ftsmc_accuracy_summary.csv' ✅ All results saved successfully!
In [77]:
# ============================================================================
# STEP 13: FINAL SUMMARY REPORT
# ============================================================================
print("\n" + "="*100)
print("📊 FINAL SUMMARY REPORT: CLARA-FTSMC MODEL")
print("="*100)
print(f"\n🔧 MODEL CONFIGURATION:")
print(f" • Clustering Method : CLARA (Clustering Large Applications)")
print(f" • Optimal Clusters (k) : {k_optimal}")
print(f" • Membership Function : Trapezoid (Q1-Q3 based)")
print(f" • Forecasting Method : Fuzzy Time Series Markov Chain (FTSMC)")
print(f" • Training Data Size : {len(df_train)} observations")
print(f" • Testing Data Size : {len(df_test)} observations")
print(f" • Total Data Points : {len(df)} observations")
print(f"\n📅 DATA RANGE:")
print(f" • Training Period : {df_train['Date'].min()} to {df_train['Date'].max()}")
print(f" • Testing Period : {df_test['Date'].min()} to {df_test['Date'].max()}")
print(f" • Total Period : {df['Date'].min()} to {df['Date'].max()}")
print(f"\n📊 FUZZY STATES SUMMARY:")
for i in range(k_optimal):
params = df_fuzzy_sets.iloc[i]
count_train = (df_train['Fuzzy_State'] == i).sum()
count_test = (df_test['Fuzzy_State'] == i).sum()
pct_train = (count_train / len(df_train)) * 100
pct_test = (count_test / len(df_test)) * 100
print(f" • A{i} [${params['b']:.2f}, ${params['c']:.2f}]:")
print(f" Training: {count_train:4d} ({pct_train:5.2f}%) | Testing: {count_test:4d} ({pct_test:5.2f}%)")
print(f"\n📈 TRAINING PERFORMANCE:")
print(f"{'─'*100}")
print(f" {'Metric':<20} {'Initial F(t)':<20} {'Final F^(t)':<20} {'Better':<15}")
print(f"{'─'*100}")
print(f" {'MAE':<20} ${initial_mae:<19.4f} ${final_mae:<19.4f} {'Final' if final_mae < initial_mae else 'Initial':<15}")
print(f" {'MSE':<20} ${initial_mse:<19.4f} ${final_mse:<19.4f} {'Final' if final_mse < initial_mse else 'Initial':<15}")
print(f" {'RMSE':<20} ${initial_rmse:<19.4f} ${final_rmse:<19.4f} {'Final' if final_rmse < initial_rmse else 'Initial':<15}")
print(f" {'MAPE (%)':<20} {initial_mape:<20.4f} {final_mape:<20.4f} {'Final' if final_mape < initial_mape else 'Initial':<15}")
print(f" {'Accuracy Level':<20} {initial_accuracy_level:<20} {final_accuracy_level:<20}")
print(f"{'─'*100}")
print(f"\n📈 TESTING PERFORMANCE:")
print(f"{'─'*100}")
print(f" {'Metric':<20} {'Initial F(t)':<20} {'Final F^(t)':<20} {'Better':<15}")
print(f"{'─'*100}")
print(f" {'MAE':<20} ${initial_mae_test:<19.4f} ${final_mae_test:<19.4f} {'Final' if final_mae_test < initial_mae_test else 'Initial':<15}")
print(f" {'MSE':<20} ${initial_mse_test:<19.4f} ${final_mse_test:<19.4f} {'Final' if final_mse_test < initial_mse_test else 'Initial':<15}")
print(f" {'RMSE':<20} ${initial_rmse_test:<19.4f} ${final_rmse_test:<19.4f} {'Final' if final_rmse_test < initial_rmse_test else 'Initial':<15}")
print(f" {'MAPE (%)':<20} {initial_mape_test:<20.4f} {final_mape_test:<20.4f} {'Final' if final_mape_test < initial_mape_test else 'Initial':<15}")
print(f" {'Accuracy Level':<20} {initial_accuracy_level_test:<20} {final_accuracy_level_test:<20}")
print(f"{'─'*100}")
print(f"\n🎯 TRANSITION MATRIX STATISTICS:")
non_zero_transitions = (transition_matrix > 0).sum()
total_possible = k_optimal * k_optimal
print(f" • Total Possible Transitions : {total_possible}")
print(f" • Observed Transitions : {non_zero_transitions}")
print(f" • Transition Coverage : {(non_zero_transitions/total_possible)*100:.2f}%")
print(f"\n🔮 FUTURE FORECAST:")
print(f" • Forecast Date : {next_date}")
print(f" • Initial Forecast F(t+1) : ${future_initial_forecast:.2f}")
print(f" • Final Forecast F^(t+1) : ${future_final_forecast:.2f}")
print(f" • Uncertainty (±RMSE) : ${final_rmse_test:.2f}")
print(f" • Forecast Range : ${future_final_forecast - final_rmse_test:.2f} - ${future_final_forecast + final_rmse_test:.2f}")
print(f"\n💡 KEY FINDINGS:")
if final_mae < initial_mae and final_mae_test < initial_mae_test:
print(f" ✅ The adjustment with diff(Y(t)) IMPROVES accuracy in BOTH training and testing")
print(f" - Training improvement: {((initial_mae - final_mae) / initial_mae * 100):.2f}%")
print(f" - Testing improvement: {((initial_mae_test - final_mae_test) / initial_mae_test * 100):.2f}%")
elif final_mae < initial_mae:
print(f" ⚠️ The adjustment improves TRAINING but not TESTING")
print(f" - May indicate overfitting to training data")
elif final_mae_test < initial_mae_test:
print(f" ⚠️ The adjustment improves TESTING but not TRAINING")
print(f" - The model generalizes well despite training performance")
else:
print(f" ⚠️ The adjustment DECREASES accuracy in both datasets")
print(f" - Consider using Initial Forecast without adjustment")
print(f"\n📊 RECOMMENDATION:")
if final_mae_test < initial_mae_test:
print(f" ✅ USE FINAL FORECAST F^(t) for predictions")
print(f" - Better testing performance: MAE = ${final_mae_test:.2f}")
else:
print(f" ✅ USE INITIAL FORECAST F(t) for predictions")
print(f" - Better testing performance: MAE = ${initial_mae_test:.2f}")
print("\n" + "="*100)
print("✅ CLARA-FTSMC MODEL BUILDING AND FORECASTING COMPLETE!")
print("="*100)
print(f"\n📁 Generated Files:")
print(f" 1. ftsmc_training_forecast.csv - Training predictions with details")
print(f" 2. ftsmc_testing_forecast.csv - Testing predictions with details")
print(f" 3. ftsmc_future_forecast.csv - One-step-ahead future prediction")
print(f" 4. ftsmc_transition_matrix.csv - Markov transition probability matrix")
print(f" 5. ftsmc_fuzzy_sets.csv - Fuzzy sets parameters")
print(f" 6. ftsmc_flrg.csv - Fuzzy logical relationship groups")
print(f" 7. ftsmc_accuracy_summary.csv - Complete accuracy comparison")
print("\n" + "="*100)
print("🎉 Thank you for using CLARA-FTSMC Model!")
print("="*100)
====================================================================================================
📊 FINAL SUMMARY REPORT: CLARA-FTSMC MODEL
====================================================================================================
🔧 MODEL CONFIGURATION:
• Clustering Method : CLARA (Clustering Large Applications)
• Optimal Clusters (k) : 6
• Membership Function : Trapezoid (Q1-Q3 based)
• Forecasting Method : Fuzzy Time Series Markov Chain (FTSMC)
• Training Data Size : 1032 observations
• Testing Data Size : 259 observations
• Total Data Points : 1291 observations
📅 DATA RANGE:
• Training Period : 2020-08-03 00:00:00 to 2024-07-30 00:00:00
• Testing Period : 2024-07-31 00:00:00 to 2025-07-31 00:00:00
• Total Period : 2020-08-03 00:00:00 to 2025-07-31 00:00:00
📊 FUZZY STATES SUMMARY:
• A0 [$41.29, $44.43]:
Training: 81 ( 7.85%) | Testing: 0 ( 0.00%)
• A1 [$50.88, $60.67]:
Training: 71 ( 6.88%) | Testing: 9 ( 3.47%)
• A2 [$70.61, $77.09]:
Training: 307 (29.75%) | Testing: 236 (91.12%)
• A3 [$82.43, $85.90]:
Training: 334 (32.36%) | Testing: 14 ( 5.41%)
• A4 [$91.41, $96.49]:
Training: 141 (13.66%) | Testing: 0 ( 0.00%)
• A5 [$107.04, $115.59]:
Training: 98 ( 9.50%) | Testing: 0 ( 0.00%)
📈 TRAINING PERFORMANCE:
────────────────────────────────────────────────────────────────────────────────────────────────────
Metric Initial F(t) Final F^(t) Better
────────────────────────────────────────────────────────────────────────────────────────────────────
MAE $1.4494 $1.9680 Initial
MSE $4.1424 $7.6556 Initial
RMSE $2.0353 $2.7669 Initial
MAPE (%) 1.8210 2.4659 Initial
Accuracy Level EXCELLENT ⭐⭐⭐ EXCELLENT ⭐⭐⭐
────────────────────────────────────────────────────────────────────────────────────────────────────
📈 TESTING PERFORMANCE:
────────────────────────────────────────────────────────────────────────────────────────────────────
Metric Initial F(t) Final F^(t) Better
────────────────────────────────────────────────────────────────────────────────────────────────────
MAE $1.0985 $1.5766 Initial
MSE $2.1239 $4.0474 Initial
RMSE $1.4574 $2.0118 Initial
MAPE (%) 1.5314 2.1876 Initial
Accuracy Level EXCELLENT ⭐⭐⭐ EXCELLENT ⭐⭐⭐
────────────────────────────────────────────────────────────────────────────────────────────────────
🎯 TRANSITION MATRIX STATISTICS:
• Total Possible Transitions : 36
• Observed Transitions : 15
• Transition Coverage : 41.67%
🔮 FUTURE FORECAST:
• Forecast Date : 2025-08-01 00:00:00
• Initial Forecast F(t+1) : $72.79
• Final Forecast F^(t+1) : $72.08
• Uncertainty (±RMSE) : $2.01
• Forecast Range : $70.07 - $74.09
💡 KEY FINDINGS:
⚠️ The adjustment DECREASES accuracy in both datasets
- Consider using Initial Forecast without adjustment
📊 RECOMMENDATION:
✅ USE INITIAL FORECAST F(t) for predictions
- Better testing performance: MAE = $1.10
====================================================================================================
✅ CLARA-FTSMC MODEL BUILDING AND FORECASTING COMPLETE!
====================================================================================================
📁 Generated Files:
1. ftsmc_training_forecast.csv - Training predictions with details
2. ftsmc_testing_forecast.csv - Testing predictions with details
3. ftsmc_future_forecast.csv - One-step-ahead future prediction
4. ftsmc_transition_matrix.csv - Markov transition probability matrix
5. ftsmc_fuzzy_sets.csv - Fuzzy sets parameters
6. ftsmc_flrg.csv - Fuzzy logical relationship groups
7. ftsmc_accuracy_summary.csv - Complete accuracy comparison
====================================================================================================
🎉 Thank you for using CLARA-FTSMC Model!
====================================================================================================
In [78]:
# ============================================================================
# BONUS: Create Summary Visualization
# ============================================================================
print("\n📊 Creating final summary visualization...")
fig = plt.figure(figsize=(20, 12))
gs = fig.add_gridspec(3, 2, hspace=0.3, wspace=0.3)
# Plot 1: Complete Time Series with Forecasts
ax1 = fig.add_subplot(gs[0, :])
ax1.plot(df_train['Date'], df_train['Price'],
label='Actual (Training)', color='blue', linewidth=2, alpha=0.7)
ax1.plot(df_test['Date'], df_test['Price'],
label='Actual (Testing)', color='darkblue', linewidth=2, alpha=0.7)
ax1.plot(df_forecast_train['Date'], df_forecast_train['Final_Forecast_F^(t)'],
label='Forecast (Training)', color='red', linewidth=1.5, alpha=0.6)
ax1.plot(df_forecast_test['Date'], df_forecast_test['Final_Forecast_F^(t)'],
label='Forecast (Testing)', color='darkred', linewidth=1.5, alpha=0.6)
ax1.plot(next_date, future_final_forecast, '*',
label=f'Future Forecast', color='green', markersize=20,
markeredgecolor='black', markeredgewidth=2)
ax1.axvline(df_test['Date'].iloc[0], color='black', linestyle='--',
linewidth=2, alpha=0.5, label='Train/Test Split')
ax1.set_xlabel('Date', fontsize=11, fontweight='bold')
ax1.set_ylabel('Price (USD)', fontsize=11, fontweight='bold')
ax1.set_title('Complete CLARA-FTSMC Model: Training, Testing & Future Forecast',
fontsize=13, fontweight='bold')
ax1.legend(fontsize=9, loc='upper left')
ax1.grid(True, alpha=0.3)
ax1.tick_params(axis='x', rotation=45)
# Plot 2: Training Accuracy Comparison
ax2 = fig.add_subplot(gs[1, 0])
metrics = ['MAE', 'MSE', 'RMSE', 'MAPE']
initial_vals = [initial_mae, initial_mse, initial_rmse, initial_mape]
final_vals = [final_mae, final_mse, final_rmse, final_mape]
x = np.arange(len(metrics))
width = 0.35
ax2.bar(x - width/2, initial_vals, width, label='Initial F(t)', color='orange', alpha=0.7)
ax2.bar(x + width/2, final_vals, width, label='Final F^(t)', color='red', alpha=0.7)
ax2.set_xlabel('Metrics', fontsize=11, fontweight='bold')
ax2.set_ylabel('Error Value', fontsize=11, fontweight='bold')
ax2.set_title('Training Accuracy Comparison', fontsize=12, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels(metrics)
ax2.legend(fontsize=9)
ax2.grid(True, alpha=0.3, axis='y')
# Plot 3: Testing Accuracy Comparison
ax3 = fig.add_subplot(gs[1, 1])
initial_vals_test = [initial_mae_test, initial_mse_test, initial_rmse_test, initial_mape_test]
final_vals_test = [final_mae_test, final_mse_test, final_rmse_test, final_mape_test]
ax3.bar(x - width/2, initial_vals_test, width, label='Initial F(t)', color='orange', alpha=0.7)
ax3.bar(x + width/2, final_vals_test, width, label='Final F^(t)', color='red', alpha=0.7)
ax3.set_xlabel('Metrics', fontsize=11, fontweight='bold')
ax3.set_ylabel('Error Value', fontsize=11, fontweight='bold')
ax3.set_title('Testing Accuracy Comparison', fontsize=12, fontweight='bold')
ax3.set_xticks(x)
ax3.set_xticklabels(metrics)
ax3.legend(fontsize=9)
ax3.grid(True, alpha=0.3, axis='y')
# Plot 4: Fuzzy States Distribution
ax4 = fig.add_subplot(gs[2, 0])
states = [f'A{i}' for i in range(k_optimal)]
train_counts = [((df_train['Fuzzy_State'] == i).sum()) for i in range(k_optimal)]
test_counts = [((df_test['Fuzzy_State'] == i).sum()) for i in range(k_optimal)]
x_states = np.arange(len(states))
ax4.bar(x_states - width/2, train_counts, width, label='Training', color='blue', alpha=0.7)
ax4.bar(x_states + width/2, test_counts, width, label='Testing', color='darkblue', alpha=0.7)
ax4.set_xlabel('Fuzzy States', fontsize=11, fontweight='bold')
ax4.set_ylabel('Frequency', fontsize=11, fontweight='bold')
ax4.set_title('Fuzzy States Distribution', fontsize=12, fontweight='bold')
ax4.set_xticks(x_states)
ax4.set_xticklabels(states)
ax4.legend(fontsize=9)
ax4.grid(True, alpha=0.3, axis='y')
# Plot 5: Transition Matrix Heatmap
ax5 = fig.add_subplot(gs[2, 1])
im = ax5.imshow(transition_matrix, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1)
ax5.set_xticks(np.arange(k_optimal))
ax5.set_yticks(np.arange(k_optimal))
ax5.set_xticklabels([f'A{i}' for i in range(k_optimal)])
ax5.set_yticklabels([f'A{i}' for i in range(k_optimal)])
ax5.set_xlabel('To State', fontsize=11, fontweight='bold')
ax5.set_ylabel('From State', fontsize=11, fontweight='bold')
ax5.set_title('Markov Transition Probability Matrix', fontsize=12, fontweight='bold')
# Add text annotations
for i in range(k_optimal):
for j in range(k_optimal):
if transition_matrix[i, j] > 0:
text = ax5.text(j, i, f'{transition_matrix[i, j]:.2f}',
ha="center", va="center", color="black", fontsize=8)
plt.colorbar(im, ax=ax5, label='Probability')
plt.suptitle('CLARA-FTSMC Model: Complete Summary Dashboard',
fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout()
plt.savefig('ftsmc_summary_dashboard.png', dpi=300, bbox_inches='tight')
print(" ✓ Summary dashboard saved to 'ftsmc_summary_dashboard.png'")
plt.show()
print("\n✅ All visualizations and reports completed!")
print("\n" + "="*100)
print("🎊 CLARA-FTSMC MODEL SUCCESSFULLY COMPLETED! 🎊")
print("="*100)
📊 Creating final summary visualization... ✓ Summary dashboard saved to 'ftsmc_summary_dashboard.png'
✅ All visualizations and reports completed! ==================================================================================================== 🎊 CLARA-FTSMC MODEL SUCCESSFULLY COMPLETED! 🎊 ====================================================================================================
In [79]:
# ============================================================================
# STEP 14: MULTI-STEP-AHEAD FORECAST (5 STEPS)
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 14: Multi-Step-Ahead Forecast (5 Steps into the Future)")
print("="*100)
print("\n⚠️ IMPORTANT: Multi-step forecast uses PREDICTED values as 'actual' for next step")
print(" This introduces cumulative uncertainty - accuracy decreases with each step.")
print("-"*100)
# Function to assign fuzzy state to a single price value
def get_fuzzy_state_from_price(price, df_fuzzy_sets, k_optimal):
"""
Determine fuzzy state for a given price based on membership functions
"""
memberships = []
for i in range(k_optimal):
params = df_fuzzy_sets.iloc[i]
mu = trapezoid_membership(price, params['a'], params['b'], params['c'], params['d'])
memberships.append(mu)
# Return state with maximum membership
max_membership = max(memberships)
fuzzy_state = memberships.index(max_membership)
return fuzzy_state, max_membership, memberships
# Initialize for multi-step forecast
n_future_steps = 5
multi_step_forecasts = []
# Get initial information from last test data
current_price = df_test.iloc[-1]['Price']
previous_price = df_test.iloc[-2]['Price']
current_date = df_test.iloc[-1]['Date']
print(f"\n📋 Starting Information:")
print(f" • Last Known Date : {current_date}")
print(f" • Last Known Price : ${current_price:.2f}")
print(f" • Previous Price : ${previous_price:.2f}")
print(f" • Initial diff(Y(t)) : ${current_price - previous_price:.2f}")
print("\n" + "="*100)
print("🔮 PERFORMING 5-STEP-AHEAD FORECASTING")
print("="*100)
# Perform multi-step forecasting
for step in range(1, n_future_steps + 1):
print(f"\n{'─'*100}")
print(f"FORECAST STEP {step}")
print(f"{'─'*100}")
# Calculate next date
from datetime import timedelta
next_date = current_date + timedelta(days=1)
# Get fuzzy state for current price
current_state, current_membership, all_memberships = get_fuzzy_state_from_price(
current_price, df_fuzzy_sets, k_optimal
)
print(f"\n📅 Target Date: {next_date}")
print(f"\n Step 1: Current State Information")
print(f" • Current Price Y(t) : ${current_price:.2f}")
print(f" • Previous Price Y(t-1) : ${previous_price:.2f}")
print(f" • Current Fuzzy State : A{current_state}")
print(f" • Membership Degree : {current_membership:.4f}")
# Show all memberships
print(f"\n • Membership degrees for current price:")
for i in range(k_optimal):
if all_memberships[i] > 0:
print(f" - μ_A{i}(${current_price:.2f}) = {all_memberships[i]:.4f}")
# Calculate diff
diff_current = current_price - previous_price
print(f"\n Step 2: Calculate diff(Y(t))")
print(f" • diff(Y(t)) = Y(t) - Y(t-1)")
print(f" • diff(Y(t)) = ${current_price:.2f} - ${previous_price:.2f}")
print(f" • diff(Y(t)) = ${diff_current:.2f}")
# Initial forecast
print(f"\n Step 3: Initial Forecast F(t+1)")
print(f" • Transition probabilities from A{current_state}:")
probs = transition_matrix[current_state]
forecast_components = []
for j in range(k_optimal):
if probs[j] > 0:
if j == current_state:
component = current_price * probs[j]
print(f" - P(A{current_state} → A{j}) = {probs[j]:.4f} × ${current_price:.2f} (current price) = ${component:.2f}")
forecast_components.append(component)
else:
component = midpoints[j] * probs[j]
print(f" - P(A{current_state} → A{j}) = {probs[j]:.4f} × ${midpoints[j]:.2f} (midpoint) = ${component:.2f}")
forecast_components.append(component)
initial_forecast = forecast_ftsmc_initial(current_state, current_price,
transition_matrix, midpoints)
print(f"\n • F(t+1) = {' + '.join([f'${comp:.2f}' for comp in forecast_components])}")
print(f" • F(t+1) = ${initial_forecast:.2f}")
# Final forecast (adjusted)
print(f"\n Step 4: Final Forecast F^(t+1) with Adjustment")
print(f" • F^(t+1) = F(t+1) + diff(Y(t))")
print(f" • F^(t+1) = ${initial_forecast:.2f} + ${diff_current:.2f}")
final_forecast = initial_forecast + diff_current
print(f" • F^(t+1) = ${final_forecast:.2f}")
# Store results
multi_step_forecasts.append({
'Step': step,
'Date': next_date,
'Y(t)_Used': current_price,
'Y(t-1)_Used': previous_price,
'Fuzzy_State': f'A{current_state}',
'Membership': current_membership,
'diff(Y(t))': diff_current,
'Initial_Forecast_F(t+1)': initial_forecast,
'Final_Forecast_F^(t+1)': final_forecast,
'Source': 'Actual' if step == 1 else 'Predicted'
})
print(f"\n ✅ Forecast Complete for {next_date}:")
print(f" • Initial Forecast : ${initial_forecast:.2f}")
print(f" • Final Forecast : ${final_forecast:.2f}")
if step < n_future_steps:
print(f"\n 🔄 Moving to next step...")
print(f" • Using F^(t+1) = ${final_forecast:.2f} as 'actual' for next prediction")
# Update for next iteration
previous_price = current_price
current_price = final_forecast # Use forecast as "actual" for next step
current_date = next_date
# Create DataFrame for multi-step forecasts
df_multi_step = pd.DataFrame(multi_step_forecasts)
print("\n" + "="*100)
print("✅ 5-STEP-AHEAD FORECASTING COMPLETE!")
print("="*100)
# Display results
print("\n📊 MULTI-STEP FORECAST RESULTS:")
print("-"*100)
print(df_multi_step.to_string(index=False, float_format=lambda x: f'{x:.2f}' if isinstance(x, float) else x))
====================================================================================================
📊 STEP 14: Multi-Step-Ahead Forecast (5 Steps into the Future)
====================================================================================================
⚠️ IMPORTANT: Multi-step forecast uses PREDICTED values as 'actual' for next step
This introduces cumulative uncertainty - accuracy decreases with each step.
----------------------------------------------------------------------------------------------------
📋 Starting Information:
• Last Known Date : 2025-07-31 00:00:00
• Last Known Price : $72.53
• Previous Price : $73.24
• Initial diff(Y(t)) : $-0.71
====================================================================================================
🔮 PERFORMING 5-STEP-AHEAD FORECASTING
====================================================================================================
────────────────────────────────────────────────────────────────────────────────────────────────────
FORECAST STEP 1
────────────────────────────────────────────────────────────────────────────────────────────────────
📅 Target Date: 2025-08-01 00:00:00
Step 1: Current State Information
• Current Price Y(t) : $72.53
• Previous Price Y(t-1) : $73.24
• Current Fuzzy State : A2
• Membership Degree : 1.0000
• Membership degrees for current price:
- μ_A2($72.53) = 1.0000
Step 2: Calculate diff(Y(t))
• diff(Y(t)) = Y(t) - Y(t-1)
• diff(Y(t)) = $72.53 - $73.24
• diff(Y(t)) = $-0.71
Step 3: Initial Forecast F(t+1)
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78 (midpoint) = $1.28
- P(A2 → A2) = 0.9216 × $72.53 (current price) = $66.84
- P(A2 → A3) = 0.0556 × $84.16 (midpoint) = $4.68
• F(t+1) = $1.28 + $66.84 + $4.68
• F(t+1) = $72.79
Step 4: Final Forecast F^(t+1) with Adjustment
• F^(t+1) = F(t+1) + diff(Y(t))
• F^(t+1) = $72.79 + $-0.71
• F^(t+1) = $72.08
✅ Forecast Complete for 2025-08-01 00:00:00:
• Initial Forecast : $72.79
• Final Forecast : $72.08
🔄 Moving to next step...
• Using F^(t+1) = $72.08 as 'actual' for next prediction
────────────────────────────────────────────────────────────────────────────────────────────────────
FORECAST STEP 2
────────────────────────────────────────────────────────────────────────────────────────────────────
📅 Target Date: 2025-08-02 00:00:00
Step 1: Current State Information
• Current Price Y(t) : $72.08
• Previous Price Y(t-1) : $72.53
• Current Fuzzy State : A2
• Membership Degree : 1.0000
• Membership degrees for current price:
- μ_A2($72.08) = 1.0000
Step 2: Calculate diff(Y(t))
• diff(Y(t)) = Y(t) - Y(t-1)
• diff(Y(t)) = $72.08 - $72.53
• diff(Y(t)) = $-0.45
Step 3: Initial Forecast F(t+1)
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78 (midpoint) = $1.28
- P(A2 → A2) = 0.9216 × $72.08 (current price) = $66.43
- P(A2 → A3) = 0.0556 × $84.16 (midpoint) = $4.68
• F(t+1) = $1.28 + $66.43 + $4.68
• F(t+1) = $72.38
Step 4: Final Forecast F^(t+1) with Adjustment
• F^(t+1) = F(t+1) + diff(Y(t))
• F^(t+1) = $72.38 + $-0.45
• F^(t+1) = $71.93
✅ Forecast Complete for 2025-08-02 00:00:00:
• Initial Forecast : $72.38
• Final Forecast : $71.93
🔄 Moving to next step...
• Using F^(t+1) = $71.93 as 'actual' for next prediction
────────────────────────────────────────────────────────────────────────────────────────────────────
FORECAST STEP 3
────────────────────────────────────────────────────────────────────────────────────────────────────
📅 Target Date: 2025-08-03 00:00:00
Step 1: Current State Information
• Current Price Y(t) : $71.93
• Previous Price Y(t-1) : $72.08
• Current Fuzzy State : A2
• Membership Degree : 1.0000
• Membership degrees for current price:
- μ_A2($71.93) = 1.0000
Step 2: Calculate diff(Y(t))
• diff(Y(t)) = Y(t) - Y(t-1)
• diff(Y(t)) = $71.93 - $72.08
• diff(Y(t)) = $-0.15
Step 3: Initial Forecast F(t+1)
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78 (midpoint) = $1.28
- P(A2 → A2) = 0.9216 × $71.93 (current price) = $66.29
- P(A2 → A3) = 0.0556 × $84.16 (midpoint) = $4.68
• F(t+1) = $1.28 + $66.29 + $4.68
• F(t+1) = $72.24
Step 4: Final Forecast F^(t+1) with Adjustment
• F^(t+1) = F(t+1) + diff(Y(t))
• F^(t+1) = $72.24 + $-0.15
• F^(t+1) = $72.10
✅ Forecast Complete for 2025-08-03 00:00:00:
• Initial Forecast : $72.24
• Final Forecast : $72.10
🔄 Moving to next step...
• Using F^(t+1) = $72.10 as 'actual' for next prediction
────────────────────────────────────────────────────────────────────────────────────────────────────
FORECAST STEP 4
────────────────────────────────────────────────────────────────────────────────────────────────────
📅 Target Date: 2025-08-04 00:00:00
Step 1: Current State Information
• Current Price Y(t) : $72.10
• Previous Price Y(t-1) : $71.93
• Current Fuzzy State : A2
• Membership Degree : 1.0000
• Membership degrees for current price:
- μ_A2($72.10) = 1.0000
Step 2: Calculate diff(Y(t))
• diff(Y(t)) = Y(t) - Y(t-1)
• diff(Y(t)) = $72.10 - $71.93
• diff(Y(t)) = $0.16
Step 3: Initial Forecast F(t+1)
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78 (midpoint) = $1.28
- P(A2 → A2) = 0.9216 × $72.10 (current price) = $66.44
- P(A2 → A3) = 0.0556 × $84.16 (midpoint) = $4.68
• F(t+1) = $1.28 + $66.44 + $4.68
• F(t+1) = $72.39
Step 4: Final Forecast F^(t+1) with Adjustment
• F^(t+1) = F(t+1) + diff(Y(t))
• F^(t+1) = $72.39 + $0.16
• F^(t+1) = $72.55
✅ Forecast Complete for 2025-08-04 00:00:00:
• Initial Forecast : $72.39
• Final Forecast : $72.55
🔄 Moving to next step...
• Using F^(t+1) = $72.55 as 'actual' for next prediction
────────────────────────────────────────────────────────────────────────────────────────────────────
FORECAST STEP 5
────────────────────────────────────────────────────────────────────────────────────────────────────
📅 Target Date: 2025-08-05 00:00:00
Step 1: Current State Information
• Current Price Y(t) : $72.55
• Previous Price Y(t-1) : $72.10
• Current Fuzzy State : A2
• Membership Degree : 1.0000
• Membership degrees for current price:
- μ_A2($72.55) = 1.0000
Step 2: Calculate diff(Y(t))
• diff(Y(t)) = Y(t) - Y(t-1)
• diff(Y(t)) = $72.55 - $72.10
• diff(Y(t)) = $0.46
Step 3: Initial Forecast F(t+1)
• Transition probabilities from A2:
- P(A2 → A1) = 0.0229 × $55.78 (midpoint) = $1.28
- P(A2 → A2) = 0.9216 × $72.55 (current price) = $66.86
- P(A2 → A3) = 0.0556 × $84.16 (midpoint) = $4.68
• F(t+1) = $1.28 + $66.86 + $4.68
• F(t+1) = $72.82
Step 4: Final Forecast F^(t+1) with Adjustment
• F^(t+1) = F(t+1) + diff(Y(t))
• F^(t+1) = $72.82 + $0.46
• F^(t+1) = $73.27
✅ Forecast Complete for 2025-08-05 00:00:00:
• Initial Forecast : $72.82
• Final Forecast : $73.27
====================================================================================================
✅ 5-STEP-AHEAD FORECASTING COMPLETE!
====================================================================================================
📊 MULTI-STEP FORECAST RESULTS:
----------------------------------------------------------------------------------------------------
Step Date Y(t)_Used Y(t-1)_Used Fuzzy_State Membership diff(Y(t)) Initial_Forecast_F(t+1) Final_Forecast_F^(t+1) Source
1 2025-08-01 72.53 73.24 A2 1.00 -0.71 72.79 72.08 Actual
2 2025-08-02 72.08 72.53 A2 1.00 -0.45 72.38 71.93 Predicted
3 2025-08-03 71.93 72.08 A2 1.00 -0.15 72.24 72.10 Predicted
4 2025-08-04 72.10 71.93 A2 1.00 0.16 72.39 72.55 Predicted
5 2025-08-05 72.55 72.10 A2 1.00 0.46 72.82 73.27 Predicted
In [80]:
# ============================================================================
# STEP 15: VISUALIZE MULTI-STEP FORECAST
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 15: Visualizing Multi-Step Forecast")
print("="*100)
# Prepare data for visualization
last_actual_date = df_test.iloc[-1]['Date']
last_actual_price = df_test.iloc[-1]['Price']
# Plot 1: Multi-Step Forecast with Actual Data
fig, axes = plt.subplots(2, 1, figsize=(20, 14))
# Subplot 1: Full view with training and testing
ax1 = axes[0]
ax1.plot(df_train['Date'], df_train['Price'],
label='Actual (Training)', color='blue', linewidth=2, alpha=0.7)
ax1.plot(df_test['Date'], df_test['Price'],
label='Actual (Testing)', color='darkblue', linewidth=2.5, alpha=0.8)
# Add multi-step forecasts
forecast_dates = df_multi_step['Date'].tolist()
initial_forecasts = df_multi_step['Initial_Forecast_F(t+1)'].tolist()
final_forecasts = df_multi_step['Final_Forecast_F^(t+1)'].tolist()
# Connect last actual to first forecast
ax1.plot([last_actual_date, forecast_dates[0]],
[last_actual_price, final_forecasts[0]],
'r--', linewidth=2, alpha=0.5)
ax1.plot(forecast_dates, final_forecasts,
'o-', label='Multi-Step Forecast (Final)', color='red',
linewidth=2.5, markersize=10, markeredgecolor='black', markeredgewidth=1.5)
ax1.plot(forecast_dates, initial_forecasts,
's--', label='Multi-Step Forecast (Initial)', color='orange',
linewidth=2, markersize=8, markeredgecolor='black', markeredgewidth=1, alpha=0.7)
# Mark each forecast step
for i, row in df_multi_step.iterrows():
ax1.annotate(f"Step {row['Step']}\n${row['Final_Forecast_F^(t+1)']:.2f}",
xy=(row['Date'], row['Final_Forecast_F^(t+1)']),
xytext=(10, 10), textcoords='offset points',
fontsize=9, fontweight='bold',
bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.7),
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0', lw=1.5))
# Vertical line at last actual data
ax1.axvline(last_actual_date, color='green', linestyle='--', linewidth=2.5,
alpha=0.7, label='Last Known Data')
# Shade forecast region
ax1.axvspan(last_actual_date, forecast_dates[-1], alpha=0.1, color='red',
label='Forecast Region')
ax1.set_xlabel('Date', fontsize=13, fontweight='bold')
ax1.set_ylabel('Price (USD)', fontsize=13, fontweight='bold')
ax1.set_title('Complete Time Series with 5-Step-Ahead Forecast\nCLARA-FTSMC Model',
fontsize=15, fontweight='bold')
ax1.legend(fontsize=10, loc='upper left')
ax1.grid(True, alpha=0.3, linestyle='--')
ax1.tick_params(axis='x', rotation=45)
# Subplot 2: Zoomed view (last 50 test points + forecast)
ax2 = axes[1]
n_display = min(50, len(df_test))
display_test = df_test.tail(n_display)
ax2.plot(display_test['Date'], display_test['Price'],
'o-', label='Actual (Testing)', color='darkblue',
linewidth=2.5, markersize=6, markeredgecolor='black', markeredgewidth=0.5)
# Connect last actual to first forecast
ax2.plot([last_actual_date, forecast_dates[0]],
[last_actual_price, final_forecasts[0]],
'r--', linewidth=2, alpha=0.5)
ax2.plot(forecast_dates, final_forecasts,
'o-', label='Multi-Step Forecast (Final)', color='red',
linewidth=3, markersize=12, markeredgecolor='black', markeredgewidth=2)
ax2.plot(forecast_dates, initial_forecasts,
's--', label='Multi-Step Forecast (Initial)', color='orange',
linewidth=2.5, markersize=10, markeredgecolor='black', markeredgewidth=1.5, alpha=0.7)
# Mark each forecast step with values
for i, row in df_multi_step.iterrows():
# Final forecast annotation
ax2.annotate(f"${row['Final_Forecast_F^(t+1)']:.2f}",
xy=(row['Date'], row['Final_Forecast_F^(t+1)']),
xytext=(0, 15), textcoords='offset points',
fontsize=10, fontweight='bold', color='red',
ha='center',
bbox=dict(boxstyle='round,pad=0.4', facecolor='yellow', alpha=0.8))
# Vertical line at last actual data
ax2.axvline(last_actual_date, color='green', linestyle='--', linewidth=2.5,
alpha=0.7, label='Last Known Data')
# Shade forecast region
ax2.axvspan(last_actual_date, forecast_dates[-1], alpha=0.15, color='red')
ax2.set_xlabel('Date', fontsize=13, fontweight='bold')
ax2.set_ylabel('Price (USD)', fontsize=13, fontweight='bold')
ax2.set_title(f'Detailed View: Last {n_display} Test Points + 5-Step Forecast',
fontsize=15, fontweight='bold')
ax2.legend(fontsize=11, loc='upper left')
ax2.grid(True, alpha=0.3, linestyle='--')
ax2.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.savefig('ftsmc_multi_step_forecast.png', dpi=300, bbox_inches='tight')
print(" ✓ Multi-step forecast plot saved to 'ftsmc_multi_step_forecast.png'")
plt.show()
==================================================================================================== 📊 STEP 15: Visualizing Multi-Step Forecast ==================================================================================================== ✓ Multi-step forecast plot saved to 'ftsmc_multi_step_forecast.png'
In [81]:
# Plot 2: Forecast Trajectory Analysis
fig, axes = plt.subplots(1, 2, figsize=(20, 7))
# Subplot 1: Initial vs Final Forecast Comparison
ax1 = axes[0]
steps = df_multi_step['Step'].tolist()
ax1.plot(steps, initial_forecasts, 'o-', label='Initial Forecast F(t)',
color='orange', linewidth=2.5, markersize=10, markeredgecolor='black', markeredgewidth=1.5)
ax1.plot(steps, final_forecasts, 's-', label='Final Forecast F^(t)',
color='red', linewidth=2.5, markersize=10, markeredgecolor='black', markeredgewidth=1.5)
# Add value labels
for i in range(len(steps)):
ax1.text(steps[i], initial_forecasts[i], f'${initial_forecasts[i]:.2f}',
fontsize=9, ha='center', va='bottom', fontweight='bold')
ax1.text(steps[i], final_forecasts[i], f'${final_forecasts[i]:.2f}',
fontsize=9, ha='center', va='top', fontweight='bold')
ax1.axhline(last_actual_price, color='blue', linestyle='--', linewidth=2,
alpha=0.7, label=f'Last Actual Price: ${last_actual_price:.2f}')
ax1.set_xlabel('Forecast Step', fontsize=12, fontweight='bold')
ax1.set_ylabel('Forecasted Price (USD)', fontsize=12, fontweight='bold')
ax1.set_title('Forecast Trajectory: Initial vs Final', fontsize=14, fontweight='bold')
ax1.set_xticks(steps)
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)
# Subplot 2: Forecast Differences
ax2 = axes[1]
differences = [final_forecasts[i] - initial_forecasts[i] for i in range(len(steps))]
colors_bar = ['green' if d > 0 else 'red' for d in differences]
bars = ax2.bar(steps, differences, color=colors_bar, alpha=0.7, edgecolor='black', linewidth=1.5)
# Add value labels on bars
for i, (step, diff) in enumerate(zip(steps, differences)):
ax2.text(step, diff, f'${diff:.2f}',
ha='center', va='bottom' if diff > 0 else 'top',
fontsize=10, fontweight='bold')
ax2.axhline(0, color='black', linestyle='-', linewidth=2)
ax2.set_xlabel('Forecast Step', fontsize=12, fontweight='bold')
ax2.set_ylabel('Difference: Final - Initial (USD)', fontsize=12, fontweight='bold')
ax2.set_title('Adjustment Effect: diff(Y(t)) Impact per Step', fontsize=14, fontweight='bold')
ax2.set_xticks(steps)
ax2.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('ftsmc_multi_step_analysis.png', dpi=300, bbox_inches='tight')
print(" ✓ Multi-step analysis plot saved to 'ftsmc_multi_step_analysis.png'")
plt.show()
# Plot 3: Fuzzy State Transitions
fig, ax = plt.subplots(figsize=(18, 8))
# Get fuzzy states for each step
fuzzy_states_journey = [int(state[1]) for state in df_multi_step['Fuzzy_State']]
# Plot state transitions
for i in range(len(fuzzy_states_journey)):
step_num = df_multi_step.iloc[i]['Step']
state = fuzzy_states_journey[i]
price = df_multi_step.iloc[i]['Final_Forecast_F^(t+1)']
# Get state boundaries
params = df_fuzzy_sets.iloc[state]
# Plot point
ax.scatter(step_num, price, s=300, c=colors[state],
edgecolors='black', linewidths=2, zorder=5, alpha=0.8)
# Add state label
ax.text(step_num, price, f'A{state}',
ha='center', va='center', fontsize=12, fontweight='bold', color='white')
# Show state range
ax.plot([step_num, step_num], [params['b'], params['c']],
color=colors[state], linewidth=8, alpha=0.3, zorder=1)
# Connect the journey
ax.plot(df_multi_step['Step'], final_forecasts, 'k--', linewidth=2, alpha=0.5, zorder=2)
# Add horizontal lines for each fuzzy set boundary
for i in range(k_optimal):
params = df_fuzzy_sets.iloc[i]
ax.axhline(params['b'], color=colors[i], linestyle=':', linewidth=1, alpha=0.3)
ax.axhline(params['c'], color=colors[i], linestyle=':', linewidth=1, alpha=0.3)
ax.text(0.5, (params['b'] + params['c'])/2, f'A{i}',
fontsize=10, fontweight='bold', color=colors[i],
bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
ax.set_xlabel('Forecast Step', fontsize=13, fontweight='bold')
ax.set_ylabel('Price (USD)', fontsize=13, fontweight='bold')
ax.set_title('Fuzzy State Transitions Across 5-Step Forecast\nShaded regions show Q1-Q3 range of each state',
fontsize=15, fontweight='bold')
ax.set_xticks(steps)
ax.grid(True, alpha=0.3)
ax.set_xlim(0.5, 5.5)
plt.tight_layout()
plt.savefig('ftsmc_state_transitions.png', dpi=300, bbox_inches='tight')
print(" ✓ State transitions plot saved to 'ftsmc_state_transitions.png'")
plt.show()
✓ Multi-step analysis plot saved to 'ftsmc_multi_step_analysis.png'
✓ State transitions plot saved to 'ftsmc_state_transitions.png'
In [82]:
# ============================================================================
# STEP 16: MULTI-STEP FORECAST SUMMARY & UNCERTAINTY ANALYSIS
# ============================================================================
print("\n" + "="*100)
print("📊 STEP 16: Multi-Step Forecast Summary & Uncertainty Analysis")
print("="*100)
print("\n📋 DETAILED FORECAST SUMMARY:")
print("-"*100)
print(f"{'Step':<6} {'Date':<12} {'Fuzzy State':<12} {'Initial F(t)':<15} {'Final F^(t)':<15} {'Adjustment':<12} {'Source':<12}")
print("-"*100)
for idx, row in df_multi_step.iterrows():
adjustment = row['Final_Forecast_F^(t+1)'] - row['Initial_Forecast_F(t+1)']
print(f"{row['Step']:<6} {str(row['Date'])[:10]:<12} {row['Fuzzy_State']:<12} "
f"${row['Initial_Forecast_F(t+1)']:<14.2f} ${row['Final_Forecast_F^(t+1)']:<14.2f} "
f"${adjustment:<11.2f} {row['Source']:<12}")
print("-"*100)
# Calculate price changes
print("\n📈 FORECAST PRICE CHANGES:")
print("-"*100)
print(f" • Starting Price (Last Actual): ${last_actual_price:.2f}")
for idx, row in df_multi_step.iterrows():
if idx == 0:
change = row['Final_Forecast_F^(t+1)'] - last_actual_price
pct_change = (change / last_actual_price) * 100
else:
prev_price = df_multi_step.iloc[idx-1]['Final_Forecast_F^(t+1)']
change = row['Final_Forecast_F^(t+1)'] - prev_price
pct_change = (change / prev_price) * 100
direction = "↑" if change > 0 else "↓"
print(f" • Step {row['Step']}: ${row['Final_Forecast_F^(t+1)']:.2f} "
f"({direction} ${abs(change):.2f}, {pct_change:+.2f}%)")
# Overall change
total_change = df_multi_step.iloc[-1]['Final_Forecast_F^(t+1)'] - last_actual_price
total_pct = (total_change / last_actual_price) * 100
print(f"\n • TOTAL CHANGE (5 steps): {total_change:+.2f} USD ({total_pct:+.2f}%)")
# Uncertainty estimation
print("\n📊 UNCERTAINTY ESTIMATION:")
print("-"*100)
print(f" • Based on Testing Performance:")
print(f" - Final Forecast MAE : ${final_mae_test:.2f}")
print(f" - Final Forecast RMSE: ${final_rmse_test:.2f}")
print(f"\n • Cumulative Uncertainty (increases with each step):")
for step in range(1, n_future_steps + 1):
# Cumulative uncertainty grows with each step
cumulative_uncertainty = final_rmse_test * np.sqrt(step)
forecast_value = df_multi_step[df_multi_step['Step'] == step]['Final_Forecast_F^(t+1)'].values[0]
lower_bound = forecast_value - cumulative_uncertainty
upper_bound = forecast_value + cumulative_uncertainty
print(f" Step {step}: ${forecast_value:.2f} ± ${cumulative_uncertainty:.2f}")
print(f" Range: [${lower_bound:.2f}, ${upper_bound:.2f}]")
====================================================================================================
📊 STEP 16: Multi-Step Forecast Summary & Uncertainty Analysis
====================================================================================================
📋 DETAILED FORECAST SUMMARY:
----------------------------------------------------------------------------------------------------
Step Date Fuzzy State Initial F(t) Final F^(t) Adjustment Source
----------------------------------------------------------------------------------------------------
1 2025-08-01 A2 $72.79 $72.08 $-0.71 Actual
2 2025-08-02 A2 $72.38 $71.93 $-0.45 Predicted
3 2025-08-03 A2 $72.24 $72.10 $-0.15 Predicted
4 2025-08-04 A2 $72.39 $72.55 $0.16 Predicted
5 2025-08-05 A2 $72.82 $73.27 $0.46 Predicted
----------------------------------------------------------------------------------------------------
📈 FORECAST PRICE CHANGES:
----------------------------------------------------------------------------------------------------
• Starting Price (Last Actual): $72.53
• Step 1: $72.08 (↓ $0.45, -0.62%)
• Step 2: $71.93 (↓ $0.15, -0.21%)
• Step 3: $72.10 (↑ $0.16, +0.22%)
• Step 4: $72.55 (↑ $0.46, +0.64%)
• Step 5: $73.27 (↑ $0.72, +0.99%)
• TOTAL CHANGE (5 steps): +0.74 USD (+1.03%)
📊 UNCERTAINTY ESTIMATION:
----------------------------------------------------------------------------------------------------
• Based on Testing Performance:
- Final Forecast MAE : $1.58
- Final Forecast RMSE: $2.01
• Cumulative Uncertainty (increases with each step):
Step 1: $72.08 ± $2.01
Range: [$70.07, $74.09]
Step 2: $71.93 ± $2.85
Range: [$69.09, $74.78]
Step 3: $72.10 ± $3.48
Range: [$68.61, $75.58]
Step 4: $72.55 ± $4.02
Range: [$68.53, $76.58]
Step 5: $73.27 ± $4.50
Range: [$68.78, $77.77]
In [83]:
# Save multi-step forecast
df_multi_step.to_csv('ftsmc_multi_step_forecast.csv', index=False)
print("\n💾 Multi-step forecast saved to 'ftsmc_multi_step_forecast.csv'")
print("\n" + "="*100)
print("✅ MULTI-STEP FORECASTING ANALYSIS COMPLETE!")
print("="*100)
print("\n⚠️ IMPORTANT NOTES:")
print(" 1. Multi-step forecasts use PREDICTED values as input for subsequent steps")
print(" 2. Uncertainty INCREASES with each step (cumulative error propagation)")
print(" 3. Step 1 is most reliable, Step 5 has highest uncertainty")
print(" 4. Actual performance can only be validated when real data becomes available")
print(" 5. Consider these forecasts as TREND INDICATORS rather than precise predictions")
print("\n📁 Generated Files for Multi-Step Forecast:")
print(" 1. ftsmc_multi_step_forecast.csv - Detailed 5-step forecast data")
print(" 2. ftsmc_multi_step_forecast.png - Complete visualization")
print(" 3. ftsmc_multi_step_analysis.png - Trajectory analysis")
print(" 4. ftsmc_state_transitions.png - Fuzzy state transitions")
print("\n" + "="*100)
print("🎉 COMPLETE CLARA-FTSMC MODELING AND FORECASTING FINISHED!")
print("="*100)
💾 Multi-step forecast saved to 'ftsmc_multi_step_forecast.csv' ==================================================================================================== ✅ MULTI-STEP FORECASTING ANALYSIS COMPLETE! ==================================================================================================== ⚠️ IMPORTANT NOTES: 1. Multi-step forecasts use PREDICTED values as input for subsequent steps 2. Uncertainty INCREASES with each step (cumulative error propagation) 3. Step 1 is most reliable, Step 5 has highest uncertainty 4. Actual performance can only be validated when real data becomes available 5. Consider these forecasts as TREND INDICATORS rather than precise predictions 📁 Generated Files for Multi-Step Forecast: 1. ftsmc_multi_step_forecast.csv - Detailed 5-step forecast data 2. ftsmc_multi_step_forecast.png - Complete visualization 3. ftsmc_multi_step_analysis.png - Trajectory analysis 4. ftsmc_state_transitions.png - Fuzzy state transitions ==================================================================================================== 🎉 COMPLETE CLARA-FTSMC MODELING AND FORECASTING FINISHED! ====================================================================================================
In [ ]: